[openstack-dev] [neutron] Mechanism drivers and Neutron server forking?

Terry Wilson twilson at redhat.com
Wed Jun 10 15:17:39 UTC 2015


There are two classes of behavior that need to be handled:

1) There are things that can only be done after forking like setting up connections or spawning threads.
2) Some things should only be done once regardless of number of forks, like syncing.

Even when you just want something to happen once, there is a good chance you may need that to happen post-fork. For example, syncing between OVSDB and neutron databases requires a socket connection and we don't want to have it going on 16 times.

Case 1 is a little complex due to how we launch api/rpc worker threads. The obvious place to notify that a fork is complete is in the RpcWorker/WorkerService start() methods, since they are the only code outside of openstack.common.service that is really called post-fork. The problem is the case where api_workers==rpc_workers==0. In this case, the parent process calls start() on both, so you end up with two calls to your post-fork initialization and only one process. It is easy enough to pass whether or not start() should call the initialization, or whether we hold off and let the main process do it before calling waitall()--it's just a bit ugly (see my patch: https://review.openstack.org/#/c/189391/).

Another option to handle case 1 would be to kill the case where you have a single process handling both workers. Always have the parent do nothing, and fork a process for each api/rpc worker treating workers=0 as workers=1. Then, start() can safely be used without hacking around the special case.

Case 2 the problem is "which process is *the one*?" The fork() call happens in the weird bastardized eventlet-threading hybrid openstack.common ThreadGroup stuff, so who knows what order things are really happening. The easiest thing to detect as unique is the parent process through some plugin pre-fork call that stores the parent's pid. The problem with using the parent process for the 'do it once' case is that we have to be able to guarantee that all the forking is really done, and it happens eventlety. Maybe an accumulator that fires off an event when api_workers + rpc_workers fork() events received? Anyway, it's messy.

Another option for 2 would be to let the plugin specify that it needs its own worker process. If so, spawn it call PluginWorker.start() which initializes after-fork. Seems like it could be cleaner.

Right now I'm leaning toward "parent always does nothing" + PluginWorker. Everything is forked, no special case for workers==0, and explicit designation of the "only one" case. Of course, it's still early in the day and I haven't had any coffee.

Terry

----- Original Message -----
> This depends on what initialize is supposed to be doing. If it's just a
> one-time sync with a back-end, then I think calling it once in each child
> process might not be what we want.
> 
> I left a comment on Terry's patch. I think we should just use the callback
> manager to have a pre-fork and post-fork even to let drivers/plugins do
> whatever is appropriate for them.
> 
> On Mon, Jun 8, 2015 at 1:00 PM, Robert Kukura < kukura at noironetworks.com >
> wrote:
> 
> 
> 
> From a driver's perspective, it would be simpler, and I think sufficient, to
> change ML2 to call initialize() on drivers after the forking, rather than
> requiring drivers to know about forking.
> 
> -Bob
> 
> 
> On 6/8/15 2:59 PM, Armando M. wrote:
> 
> 
> 
> Interestingly, [1] was filed a few moments ago:
> 
> [1] https://bugs.launchpad.net/neutron/+bug/1463129
> 
> On 2 June 2015 at 22:48, Salvatore Orlando < sorlando at nicira.com > wrote:
> 
> 
> 
> I'm not sure if you can test this behaviour on your own because it requires
> the VMware plugin and the eventlet handling of backend response.
> 
> But the issue was manifesting and had to be fixed with this mega-hack [1].
> The issue was not about several workers executing the same code - the
> loopingcall was always started on a single thread. The issue I witnessed was
> that the other API workers just hang.
> 
> There's probably something we need to understand about how eventlet can work
> safely with a os.fork (I just think they're not really made to work
> together!).
> Regardless, I did not spent too much time on it, because I thought that the
> multiple workers code might have been rewritten anyway by the pecan switch
> activities you're doing.
> 
> Salvatore
> 
> 
> [1] https://review.openstack.org/#/c/180145/
> 
> On 3 June 2015 at 02:20, Kevin Benton < blak111 at gmail.com > wrote:
> 
> 
> 
> Sorry about the long delay.
> 
> > Even the LOG.error("KEVIN PID=%s network response: %s" % (os.getpid(),
> > r.text)) line? Surely the server would have forked before that line was
> > executed - so what could prevent it from executing once in each forked
> > process, and hence generating multiple logs?
> 
> Yes, just once. I wasn't able to reproduce the behavior you ran into. Maybe
> eventlet has some protection for this? Can you provide small sample code for
> the logging driver that does reproduce the issue?
> 
> On Wed, May 13, 2015 at 5:19 AM, Neil Jerram < Neil.Jerram at metaswitch.com >
> wrote:
> 
> 
> Hi Kevin,
> 
> Thanks for your response...
> 
> On 08/05/15 08:43, Kevin Benton wrote:
> 
> 
> I'm not sure I understand the behavior you are seeing. When your
> mechanism driver gets initialized and kicks off processing, all of that
> should be happening in the parent PID. I don't know why your child
> processes start executing code that wasn't invoked. Can you provide a
> pointer to the code or give a sample that reproduces the issue?
> 
> https://github.com/Metaswitch/calico/tree/master/calico/openstack
> 
> Basically, our driver's initialize method immediately kicks off a green
> thread to audit what is now in the Neutron DB, and to ensure that the other
> Calico components are consistent with that.
> 
> 
> 
> I modified the linuxbridge mech driver to try to reproduce it:
> http://paste.openstack.org/show/216859/
> 
> In the output, I never received any of the init code output I added more
> than once, including the function spawned using eventlet.
> 
> Interesting. Even the LOG.error("KEVIN PID=%s network response: %s" %
> (os.getpid(), r.text)) line? Surely the server would have forked before that
> line was executed - so what could prevent it from executing once in each
> forked process, and hence generating multiple logs?
> 
> Thanks,
> Neil
> 
> 
> 
> The only time I ever saw anything executed by a child process was actual
> API requests (e.g. the create_port method).
> 
> 
> 
> 
> 
> On Thu, May 7, 2015 at 6:08 AM, Neil Jerram < Neil.Jerram at metaswitch.com
> <mailto: Neil.Jerram at metaswitch.com >> wrote:
> 
> Is there a design for how ML2 mechanism drivers are supposed to cope
> with the Neutron server forking?
> 
> What I'm currently seeing, with api_workers = 2, is:
> 
> - my mechanism driver gets instantiated and initialized, and
> immediately kicks off some processing that involves communicating
> over the network
> 
> - the Neutron server process then forks into multiple copies
> 
> - multiple copies of my driver's network processing then continue,
> and interfere badly with each other :-)
> 
> I think what I should do is:
> 
> - wait until any forking has happened
> 
> - then decide (somehow) which mechanism driver is going to kick off
> that processing, and do that.
> 
> But how can a mechanism driver know when the Neutron server forking
> has happened?
> 
> Thanks,
> Neil
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> < http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> --
> Kevin Benton
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> --
> Kevin Benton
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> --
> Kevin Benton
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list