[openstack-dev] [neutron] high dhcp lease times in neutron deployments considered harmful (or not???)
Kevin Benton
blak111 at gmail.com
Wed Jan 28 08:50:04 UTC 2015
Hi,
Approximately a year and a half ago, the default DHCP lease time in Neutron
was increased from 120 seconds to 86400 seconds.[1] This was done with the
goal of reducing DHCP traffic with very little discussion (based on what I
can see in the review and bug report). While it it does indeed reduce DHCP
traffic, I don't think any bug reports were filed showing that a 120 second
lease time resulted in too much traffic or that a jump all of the way to
86400 seconds was required instead of a value in the same order of
magnitude.
Why does this matter?
Neutron ports can be updated with a new IP address from the same subnet or
another subnet on the same network. The port update will result in
anti-spoofing iptables rule changes that immediately stop the old IP
address from working on the host. This means the host is unreachable for
0-12 hours based on the current default lease time without manual
intervention[2] (assuming half-lease length DHCP renewal attempts).
Why is this on the mailing list?
In an attempt to make the VMs usable in a much shorter timeframe following
a Neutron port address change, I submitted a patch to reduce the default
DHCP lease time to 8 minutes.[3] However, this was upsetting to several
people,[4] so it was suggested I bring this discussion to the mailing list.
The following are the high-level concerns followed by my responses:
- 8 minutes is arbitrary
- Yes, but it's no more arbitrary than 1440 minutes. I picked it as
an interval because it is still 4 times larger than the last short value,
but it still allows VMs to regain connectivity in <5 minutes in the event
their IP is changed. If someone has a good suggestion for
another interval
based on known dnsmasq QPS limits or some other quantitative
reason, please
chime in here.
- other datacenters use long lease times
- This is true, but it's not really a valid comparison. In most
regular datacenters, updating a static DHCP lease has no effect
on the data
plane so it doesn't matter that the client doesn't react for hours/days
(even with DHCP snooping enabled). However, in Neutron's case,
the security
groups are immediately updated so all traffic using the old address is
blocked.
- dhcp traffic is scary because it's broadcast
- ARP traffic is also broadcast and many clients will expire entries
every 5-10 minutes and re-ARP. L2population may be used to prevent ARP
propagation, so the comparison between DHCP and ARP isn't always relevant
here.
Please reply back with your opinions/anecdotes/data related to short DHCP
lease times.
Cheers
1.
https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe
2. Manual intervention could be an instance reboot, a dhcp client
invocation via the console, or a delayed invocation right before the
update. (all significantly more difficult to script than a simple update of
a port's IP via the API).
3. https://review.openstack.org/#/c/150595/
4. http://i.imgur.com/xtvatkP.jpg
--
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150128/7e3ee674/attachment.html>
More information about the OpenStack-dev
mailing list