[openstack-dev] [Neutron] DHCP Agent Reliability
clint at fewbar.com
Wed Dec 4 02:57:58 UTC 2013
Excerpts from Maru Newby's message of 2013-12-03 08:08:09 -0800:
> I've been investigating a bug that is preventing VM's from receiving IP addresses when a Neutron service is under high load:
> High load causes the DHCP agent's status updates to be delayed, causing the Neutron service to assume that the agent is down. This results in the Neutron service not sending notifications of port addition to the DHCP agent. At present, the notifications are simply dropped. A simple fix is to send notifications regardless of agent status. Does anybody have any objections to this stop-gap approach? I'm not clear on the implications of sending notifications to agents that are down, but I'm hoping for a simple fix that can be backported to both havana and grizzly (yes, this bug has been with us that long).
> Fixing this problem for real, though, will likely be more involved. The proposal to replace the current wsgi framework with Pecan may increase the Neutron service's scalability, but should we continue to use a 'fire and forget' approach to notification? Being able to track the success or failure of a given action outside of the logs would seem pretty important, and allow for more effective coordination with Nova than is currently possible.
Dropping requests without triggering a user-visible error is a pretty
serious problem. You didn't mention if you have filed a bug about that.
If not, please do or let us know here so we can investigate and file
It seems to me that they should be put into a queue to be retried.
Sending the notifications blindly is almost as bad as dropping them,
as you have no idea if the agent is alive or not.
More information about the OpenStack-dev