[openstack-dev] [Neutron] DHCP Agent Reliability

Maru Newby marun at redhat.com
Wed Dec 4 03:28:10 UTC 2013


On Dec 4, 2013, at 11:02 AM, Yongsheng Gong <gongysh at unitedstack.com> wrote:

> another way is to have a large agent_down_time, by default it is 9 secs.

I don't believe that increasing the timeout by itself is a good solution.  Relying on the agent state to know whether to send a notification has simply proven unreliable with the current architecture of a poorly-performing single process server handling both RPC and WSGI.


m.

> 
> 
> On Wed, Dec 4, 2013 at 7:55 AM, Carl Baldwin <carl at ecbaldwin.net> wrote:
> Stephen, all,
> 
> I agree that there may be some opportunity to split things out a bit.
> However, I'm not sure what the best way will be.  I recall that Mark
> mentioned breaking out the processes that handle API requests and RPC
> from each other at the summit.  Anyway, it is something that has been
> discussed.
> 
> I actually wanted to point out that the neutron server now has the
> ability to run a configurable number of sub-processes to handle a
> heavier load.  Introduced with this commit:
> 
> https://review.openstack.org/#/c/37131/
> 
> Set api_workers to something > 1 and restart the server.
> 
> The server can also be run on more than one physical host in
> combination with multiple child processes.
> 
> Carl
> 
> On Tue, Dec 3, 2013 at 9:47 AM, Stephen Gran
> <stephen.gran at theguardian.com> wrote:
> > On 03/12/13 16:08, Maru Newby wrote:
> >>
> >> I've been investigating a bug that is preventing VM's from receiving IP
> >> addresses when a Neutron service is under high load:
> >>
> >> https://bugs.launchpad.net/neutron/+bug/1192381
> >>
> >> High load causes the DHCP agent's status updates to be delayed, causing
> >> the Neutron service to assume that the agent is down.  This results in the
> >> Neutron service not sending notifications of port addition to the DHCP
> >> agent.  At present, the notifications are simply dropped.  A simple fix is
> >> to send notifications regardless of agent status.  Does anybody have any
> >> objections to this stop-gap approach?  I'm not clear on the implications of
> >> sending notifications to agents that are down, but I'm hoping for a simple
> >> fix that can be backported to both havana and grizzly (yes, this bug has
> >> been with us that long).
> >>
> >> Fixing this problem for real, though, will likely be more involved.  The
> >> proposal to replace the current wsgi framework with Pecan may increase the
> >> Neutron service's scalability, but should we continue to use a 'fire and
> >> forget' approach to notification?  Being able to track the success or
> >> failure of a given action outside of the logs would seem pretty important,
> >> and allow for more effective coordination with Nova than is currently
> >> possible.
> >
> >
> > It strikes me that we ask an awful lot of a single neutron-server instance -
> > it has to take state updates from all the agents, it has to do scheduling,
> > it has to respond to API requests, and it has to communicate about actual
> > changes with the agents.
> >
> > Maybe breaking some of these out the way nova has a scheduler and a
> > conductor and so on might be a good model (I know there are things people
> > are unhappy about with nova-scheduler, but imagine how much worse it would
> > be if it was built into the API).
> >
> > Doing all of those tasks, and doing it largely single threaded, is just
> > asking for overload.
> >
> > Cheers,
> > --
> > Stephen Gran
> > Senior Systems Integrator - theguardian.com
> > Please consider the environment before printing this email.
> > ------------------------------------------------------------------
> > Visit theguardian.com
> > On your mobile, download the Guardian iPhone app theguardian.com/iphone and
> > our iPad edition theguardian.com/iPad   Save up to 33% by subscribing to the
> > Guardian and Observer - choose the papers you want and get full digital
> > access.
> > Visit subscribe.theguardian.com
> >
> > This e-mail and all attachments are confidential and may also
> > be privileged. If you are not the named recipient, please notify
> > the sender and delete the e-mail and all attachments immediately.
> > Do not disclose the contents to another person. You may not use
> > the information for any purpose, or store, or copy, it in any way.
> >
> > Guardian News & Media Limited is not liable for any computer
> > viruses or other material transmitted with or as part of this
> > e-mail. You should employ virus checking software.
> >
> > Guardian News & Media Limited
> >
> > A member of Guardian Media Group plc
> > Registered Office
> > PO Box 68164
> > Kings Place
> > 90 York Way
> > London
> > N1P 2AP
> >
> > Registered in England Number 908396
> >
> > --------------------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list