[openstack-dev] [Neutron] DHCP Agent Reliability

Maru Newby marun at redhat.com
Wed Dec 4 15:00:30 UTC 2013


On Dec 4, 2013, at 8:55 AM, Carl Baldwin <carl at ecbaldwin.net> wrote:

> Stephen, all,
> 
> I agree that there may be some opportunity to split things out a bit.
> However, I'm not sure what the best way will be.  I recall that Mark
> mentioned breaking out the processes that handle API requests and RPC
> from each other at the summit.  Anyway, it is something that has been
> discussed.
> 
> I actually wanted to point out that the neutron server now has the
> ability to run a configurable number of sub-processes to handle a
> heavier load.  Introduced with this commit:
> 
> https://review.openstack.org/#/c/37131/
> 
> Set api_workers to something > 1 and restart the server.
> 
> The server can also be run on more than one physical host in
> combination with multiple child processes.

I completely misunderstood the import of the commit in question.  Being able to run the wsgi server(s) out of process is a nice improvement, thank you for making it happen.  Has there been any discussion around making the default for api_workers > 0 (at least 1) to ensure that the default configuration separates wsgi and rpc load?  This also seems like a great candidate for backporting to havana and maybe even grizzly, although api_workers should probably be defaulted to 0 in those cases.

FYI, I re-ran the test that attempted to boot 75 micro VM's simultaneously with api_workers = 2, with mixed results.  The increased wsgi throughput resulted in almost half of the boot requests failing with 500 errors due to QueuePool errors (https://bugs.launchpad.net/neutron/+bug/1160442) in Neutron.  It also appears that maximizing the number of wsgi requests has the side-effect of increasing the RPC load on the main process, and this means that the problem of dhcp notifications being dropped is little improved.  I intend to submit a fix that ensures that notifications are sent regardless of agent status, in any case.


m.

> 
> Carl
> 
> On Tue, Dec 3, 2013 at 9:47 AM, Stephen Gran
> <stephen.gran at theguardian.com> wrote:
>> On 03/12/13 16:08, Maru Newby wrote:
>>> 
>>> I've been investigating a bug that is preventing VM's from receiving IP
>>> addresses when a Neutron service is under high load:
>>> 
>>> https://bugs.launchpad.net/neutron/+bug/1192381
>>> 
>>> High load causes the DHCP agent's status updates to be delayed, causing
>>> the Neutron service to assume that the agent is down.  This results in the
>>> Neutron service not sending notifications of port addition to the DHCP
>>> agent.  At present, the notifications are simply dropped.  A simple fix is
>>> to send notifications regardless of agent status.  Does anybody have any
>>> objections to this stop-gap approach?  I'm not clear on the implications of
>>> sending notifications to agents that are down, but I'm hoping for a simple
>>> fix that can be backported to both havana and grizzly (yes, this bug has
>>> been with us that long).
>>> 
>>> Fixing this problem for real, though, will likely be more involved.  The
>>> proposal to replace the current wsgi framework with Pecan may increase the
>>> Neutron service's scalability, but should we continue to use a 'fire and
>>> forget' approach to notification?  Being able to track the success or
>>> failure of a given action outside of the logs would seem pretty important,
>>> and allow for more effective coordination with Nova than is currently
>>> possible.
>> 
>> 
>> It strikes me that we ask an awful lot of a single neutron-server instance -
>> it has to take state updates from all the agents, it has to do scheduling,
>> it has to respond to API requests, and it has to communicate about actual
>> changes with the agents.
>> 
>> Maybe breaking some of these out the way nova has a scheduler and a
>> conductor and so on might be a good model (I know there are things people
>> are unhappy about with nova-scheduler, but imagine how much worse it would
>> be if it was built into the API).
>> 
>> Doing all of those tasks, and doing it largely single threaded, is just
>> asking for overload.
>> 
>> Cheers,
>> --
>> Stephen Gran
>> Senior Systems Integrator - theguardian.com
>> Please consider the environment before printing this email.
>> ------------------------------------------------------------------
>> Visit theguardian.com
>> On your mobile, download the Guardian iPhone app theguardian.com/iphone and
>> our iPad edition theguardian.com/iPad   Save up to 33% by subscribing to the
>> Guardian and Observer - choose the papers you want and get full digital
>> access.
>> Visit subscribe.theguardian.com
>> 
>> This e-mail and all attachments are confidential and may also
>> be privileged. If you are not the named recipient, please notify
>> the sender and delete the e-mail and all attachments immediately.
>> Do not disclose the contents to another person. You may not use
>> the information for any purpose, or store, or copy, it in any way.
>> 
>> Guardian News & Media Limited is not liable for any computer
>> viruses or other material transmitted with or as part of this
>> e-mail. You should employ virus checking software.
>> 
>> Guardian News & Media Limited
>> 
>> A member of Guardian Media Group plc
>> Registered Office
>> PO Box 68164
>> Kings Place
>> 90 York Way
>> London
>> N1P 2AP
>> 
>> Registered in England Number 908396
>> 
>> --------------------------------------------------------------------------
>> 
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list