[openstack-dev] [Neutron][L3] Orphaned process cleanup
ihrachys at redhat.com
Thu Jan 28 13:15:32 UTC 2016
Sean M. Collins <sean at coreitpro.com> wrote:
> I started poking a bit at https://bugs.launchpad.net/devstack/+bug/1535661
> We have radvd processes that the l3 agent launches, and if the l3 agent
> is terminated these radvd processes continue to run. I think we should
> probably terminate them when the l3 agent is terminated, like if we are
> in DevStack and doing an unstack.sh. There's a fix on the DevStack
> side but I'm waffling a bit on if it's the right thing to do or not.
> The only concern I have is if there are situations where the l3 agent
> terminates, but we don't want data plane disruption. For example, if
> something goes wrong and the L3 agent dies, if the OS will be sending a
> SIGABRT (which my WIP patch doesn't catch and radvd would continue to
> run) or if a
> SIGTERM is issued, or worse, an OOM event occurs (I think thats a
> SIGTERM too?) and you get an outage.
> : https://review.openstack.org/269560
> : https://review.openstack.org/273228
> Sean M. Collins
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
As Assaf pointed out, we don’t want to clean up processes on agent died.
In RDO, we ship OCF resources to manage our services using pacemaker, and
there, we trigger some scripts that cleanup on service fencing:
We kill radvd, netns-proxy, keepalived, and friends.
I think that ideal solution here would be to have a separate executable
similar to neutron-netns-cleanup and neutron-ovs-cleanup
(neutron-l3-agent-cleanup?) that would be executed by external tools that
want to clean up after an agent.
More information about the OpenStack-dev