[openstack-dev] [Neutron] DHCP Agent Reliability

Maru Newby marun at redhat.com
Tue Dec 3 16:08:09 UTC 2013


I've been investigating a bug that is preventing VM's from receiving IP addresses when a Neutron service is under high load:

https://bugs.launchpad.net/neutron/+bug/1192381

High load causes the DHCP agent's status updates to be delayed, causing the Neutron service to assume that the agent is down.  This results in the Neutron service not sending notifications of port addition to the DHCP agent.  At present, the notifications are simply dropped.  A simple fix is to send notifications regardless of agent status.  Does anybody have any objections to this stop-gap approach?  I'm not clear on the implications of sending notifications to agents that are down, but I'm hoping for a simple fix that can be backported to both havana and grizzly (yes, this bug has been with us that long).

Fixing this problem for real, though, will likely be more involved.  The proposal to replace the current wsgi framework with Pecan may increase the Neutron service's scalability, but should we continue to use a 'fire and forget' approach to notification?  Being able to track the success or failure of a given action outside of the logs would seem pretty important, and allow for more effective coordination with Nova than is currently possible.


Maru


More information about the OpenStack-dev mailing list