[Neutron][TripleO] Avoid dangling sidecars containers - add hook support
Dear List, While doing some validations on an "in-use" node, we detected a container being Exited with status 137. After some researches and discussions, it appears this container is launched from a Neutron container, as a sidecar, and then the service is killed: https://github.com/openstack/neutron/blob/master/neutron/agent/linux/externa... While the "kill" is fine outside of a container, it leads to some issues in a containerized world: - dangling container is here, with a failed state - validating container state is therefore complicated - might lead to false assumption if we don't know that "kill" process - might lead to disk space issues In order to sort that bad situation out, and prevent disk space issues and the like, I propose to discuss about some "hook" addition in that "external_process.py" (or wherever is more suited for that usage) There is apparently something like that for the service launch, since there are wrapper scripts in /var/lib/neutron. In my idea, that wrapper script could generate a new wrapper used in order to actually delete the container (instead of kill -9 <pid>). The neutron code could be modified in a simple way, something like: ############### if os.path.exists(<wrapper-file>): utils.execute(['bash', <wrapper-file>]) else: # current way to handle things ############### The <wrapper-file> might have "something" in its name in order to ensure we're actually killing the right process/container. self.pid, or maybe some specific string that neutron is aware of... you get the idea. Even outside of a container, that might be used in order to clean temporary configurations/files, and ensure we're actually facing a clean environment. It's really something more like "hooks" than "container-centric thingy" Would you consider that kind of new feature? Thank you for your time and consideration! Cheers, C. -- Cédric Jeanneret Software Engineer DFG:DF
Hi, IMO this is good idea and You should propose it as RFE on lauchpad. It can also help address our current issue with including python process in rootwrap kill filters as it would be maybe not necessary anymore. Also, from quick look I think that external_process.ProcessMonitor is ready for something like that as it has possibility to pass „get_stop_command” callback as an argument - see [1]. So maybe it would be not very hard to implement :) [1] https://github.com/openstack/neutron/blob/master/neutron/agent/linux/externa...
Wiadomość napisana przez Cédric Jeanneret <cjeanner@redhat.com> w dniu 17.04.2019, o godz. 13:09:
Dear List,
While doing some validations on an "in-use" node, we detected a container being Exited with status 137. After some researches and discussions, it appears this container is launched from a Neutron container, as a sidecar, and then the service is killed:
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/externa...
While the "kill" is fine outside of a container, it leads to some issues in a containerized world: - dangling container is here, with a failed state - validating container state is therefore complicated - might lead to false assumption if we don't know that "kill" process - might lead to disk space issues
In order to sort that bad situation out, and prevent disk space issues and the like, I propose to discuss about some "hook" addition in that "external_process.py" (or wherever is more suited for that usage)
There is apparently something like that for the service launch, since there are wrapper scripts in /var/lib/neutron. In my idea, that wrapper script could generate a new wrapper used in order to actually delete the container (instead of kill -9 <pid>).
The neutron code could be modified in a simple way, something like: ############### if os.path.exists(<wrapper-file>): utils.execute(['bash', <wrapper-file>]) else: # current way to handle things ###############
The <wrapper-file> might have "something" in its name in order to ensure we're actually killing the right process/container. self.pid, or maybe some specific string that neutron is aware of... you get the idea.
Even outside of a container, that might be used in order to clean temporary configurations/files, and ensure we're actually facing a clean environment. It's really something more like "hooks" than "container-centric thingy"
Would you consider that kind of new feature?
Thank you for your time and consideration!
Cheers,
C.
-- Cédric Jeanneret Software Engineer DFG:DF
— Slawek Kaplonski Senior software engineer Red Hat
Hey :) On 4/20/19 9:21 AM, Slawomir Kaplonski wrote:
Hi,
IMO this is good idea and You should propose it as RFE on lauchpad.
I've created it in Neutron namespace: https://bugs.launchpad.net/neutron/+bug/1825943
It can also help address our current issue with including python process in rootwrap kill filters as it would be maybe not necessary anymore. Also, from quick look I think that external_process.ProcessMonitor is ready for something like that as it has possibility to pass „get_stop_command” callback as an argument - see [1]. So maybe it would be not very hard to implement :)
That's good to know then :). Hopefully we'll be able to see that coming quickly - dangling containers aren't good ^^'. Cheers, C.
[1] https://github.com/openstack/neutron/blob/master/neutron/agent/linux/externa...
Wiadomość napisana przez Cédric Jeanneret <cjeanner@redhat.com> w dniu 17.04.2019, o godz. 13:09:
Dear List,
While doing some validations on an "in-use" node, we detected a container being Exited with status 137. After some researches and discussions, it appears this container is launched from a Neutron container, as a sidecar, and then the service is killed:
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/externa...
While the "kill" is fine outside of a container, it leads to some issues in a containerized world: - dangling container is here, with a failed state - validating container state is therefore complicated - might lead to false assumption if we don't know that "kill" process - might lead to disk space issues
In order to sort that bad situation out, and prevent disk space issues and the like, I propose to discuss about some "hook" addition in that "external_process.py" (or wherever is more suited for that usage)
There is apparently something like that for the service launch, since there are wrapper scripts in /var/lib/neutron. In my idea, that wrapper script could generate a new wrapper used in order to actually delete the container (instead of kill -9 <pid>).
The neutron code could be modified in a simple way, something like: ############### if os.path.exists(<wrapper-file>): utils.execute(['bash', <wrapper-file>]) else: # current way to handle things ###############
The <wrapper-file> might have "something" in its name in order to ensure we're actually killing the right process/container. self.pid, or maybe some specific string that neutron is aware of... you get the idea.
Even outside of a container, that might be used in order to clean temporary configurations/files, and ensure we're actually facing a clean environment. It's really something more like "hooks" than "container-centric thingy"
Would you consider that kind of new feature?
Thank you for your time and consideration!
Cheers,
C.
-- Cédric Jeanneret Software Engineer DFG:DF
— Slawek Kaplonski Senior software engineer Red Hat
-- Cédric Jeanneret Software Engineer DFG:DF
participants (2)
-
Cédric Jeanneret
-
Slawomir Kaplonski