[openstack-dev] [Neutron] DHCP Agent Reliability

Carl Baldwin carl at ecbaldwin.net
Wed Dec 4 21:43:07 UTC 2013


I have offered up https://review.openstack.org/#/c/60082/ as a
backport to Havana.  Interest was expressed in the blueprint for doing
this even before this thread.  If there is consensus for this as the
stop-gap then it is there for the merging.  However, I do not want to
discourage discussion of other stop-gap solutions like what Maru
proposed in the original post.

Carl

On Wed, Dec 4, 2013 at 9:12 AM, Ashok Kumaran <ashokkumaran.b at gmail.com> wrote:
>
>
>
> On Wed, Dec 4, 2013 at 8:30 PM, Maru Newby <marun at redhat.com> wrote:
>>
>>
>> On Dec 4, 2013, at 8:55 AM, Carl Baldwin <carl at ecbaldwin.net> wrote:
>>
>> > Stephen, all,
>> >
>> > I agree that there may be some opportunity to split things out a bit.
>> > However, I'm not sure what the best way will be.  I recall that Mark
>> > mentioned breaking out the processes that handle API requests and RPC
>> > from each other at the summit.  Anyway, it is something that has been
>> > discussed.
>> >
>> > I actually wanted to point out that the neutron server now has the
>> > ability to run a configurable number of sub-processes to handle a
>> > heavier load.  Introduced with this commit:
>> >
>> > https://review.openstack.org/#/c/37131/
>> >
>> > Set api_workers to something > 1 and restart the server.
>> >
>> > The server can also be run on more than one physical host in
>> > combination with multiple child processes.
>>
>> I completely misunderstood the import of the commit in question.  Being
>> able to run the wsgi server(s) out of process is a nice improvement, thank
>> you for making it happen.  Has there been any discussion around making the
>> default for api_workers > 0 (at least 1) to ensure that the default
>> configuration separates wsgi and rpc load?  This also seems like a great
>> candidate for backporting to havana and maybe even grizzly, although
>> api_workers should probably be defaulted to 0 in those cases.
>
>
> +1 for backporting the api_workers feature to havana as well as Grizzly :)
>>
>>
>> FYI, I re-ran the test that attempted to boot 75 micro VM's simultaneously
>> with api_workers = 2, with mixed results.  The increased wsgi throughput
>> resulted in almost half of the boot requests failing with 500 errors due to
>> QueuePool errors (https://bugs.launchpad.net/neutron/+bug/1160442) in
>> Neutron.  It also appears that maximizing the number of wsgi requests has
>> the side-effect of increasing the RPC load on the main process, and this
>> means that the problem of dhcp notifications being dropped is little
>> improved.  I intend to submit a fix that ensures that notifications are sent
>> regardless of agent status, in any case.
>>
>>
>> m.
>>
>> >
>> > Carl
>> >
>> > On Tue, Dec 3, 2013 at 9:47 AM, Stephen Gran
>> > <stephen.gran at theguardian.com> wrote:
>> >> On 03/12/13 16:08, Maru Newby wrote:
>> >>>
>> >>> I've been investigating a bug that is preventing VM's from receiving
>> >>> IP
>> >>> addresses when a Neutron service is under high load:
>> >>>
>> >>> https://bugs.launchpad.net/neutron/+bug/1192381
>> >>>
>> >>> High load causes the DHCP agent's status updates to be delayed,
>> >>> causing
>> >>> the Neutron service to assume that the agent is down.  This results in
>> >>> the
>> >>> Neutron service not sending notifications of port addition to the DHCP
>> >>> agent.  At present, the notifications are simply dropped.  A simple
>> >>> fix is
>> >>> to send notifications regardless of agent status.  Does anybody have
>> >>> any
>> >>> objections to this stop-gap approach?  I'm not clear on the
>> >>> implications of
>> >>> sending notifications to agents that are down, but I'm hoping for a
>> >>> simple
>> >>> fix that can be backported to both havana and grizzly (yes, this bug
>> >>> has
>> >>> been with us that long).
>> >>>
>> >>> Fixing this problem for real, though, will likely be more involved.
>> >>> The
>> >>> proposal to replace the current wsgi framework with Pecan may increase
>> >>> the
>> >>> Neutron service's scalability, but should we continue to use a 'fire
>> >>> and
>> >>> forget' approach to notification?  Being able to track the success or
>> >>> failure of a given action outside of the logs would seem pretty
>> >>> important,
>> >>> and allow for more effective coordination with Nova than is currently
>> >>> possible.
>> >>
>> >>
>> >> It strikes me that we ask an awful lot of a single neutron-server
>> >> instance -
>> >> it has to take state updates from all the agents, it has to do
>> >> scheduling,
>> >> it has to respond to API requests, and it has to communicate about
>> >> actual
>> >> changes with the agents.
>> >>
>> >> Maybe breaking some of these out the way nova has a scheduler and a
>> >> conductor and so on might be a good model (I know there are things
>> >> people
>> >> are unhappy about with nova-scheduler, but imagine how much worse it
>> >> would
>> >> be if it was built into the API).
>> >>
>> >> Doing all of those tasks, and doing it largely single threaded, is just
>> >> asking for overload.
>> >>
>> >> Cheers,
>> >> --
>> >> Stephen Gran
>> >> Senior Systems Integrator - theguardian.com
>> >> Please consider the environment before printing this email.
>> >> ------------------------------------------------------------------
>> >> Visit theguardian.com
>> >> On your mobile, download the Guardian iPhone app theguardian.com/iphone
>> >> and
>> >> our iPad edition theguardian.com/iPad   Save up to 33% by subscribing
>> >> to the
>> >> Guardian and Observer - choose the papers you want and get full digital
>> >> access.
>> >> Visit subscribe.theguardian.com
>> >>
>> >> This e-mail and all attachments are confidential and may also
>> >> be privileged. If you are not the named recipient, please notify
>> >> the sender and delete the e-mail and all attachments immediately.
>> >> Do not disclose the contents to another person. You may not use
>> >> the information for any purpose, or store, or copy, it in any way.
>> >>
>> >> Guardian News & Media Limited is not liable for any computer
>> >> viruses or other material transmitted with or as part of this
>> >> e-mail. You should employ virus checking software.
>> >>
>> >> Guardian News & Media Limited
>> >>
>> >> A member of Guardian Media Group plc
>> >> Registered Office
>> >> PO Box 68164
>> >> Kings Place
>> >> 90 York Way
>> >> London
>> >> N1P 2AP
>> >>
>> >> Registered in England Number 908396
>> >>
>> >>
>> >> --------------------------------------------------------------------------
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> OpenStack-dev mailing list
>> >> OpenStack-dev at lists.openstack.org
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list