[openstack-dev] [neutron] high dhcp lease times in neutron deployments considered harmful (or not???)

Kevin Benton blak111 at gmail.com
Wed Jan 28 08:50:04 UTC 2015


Hi,

Approximately a year and a half ago, the default DHCP lease time in Neutron
was increased from 120 seconds to 86400 seconds.[1] This was done with the
goal of reducing DHCP traffic with very little discussion (based on what I
can see in the review and bug report). While it it does indeed reduce DHCP
traffic, I don't think any bug reports were filed showing that a 120 second
lease time resulted in too much traffic or that a jump all of the way to
86400 seconds was required instead of a value in the same order of
magnitude.

Why does this matter?

Neutron ports can be updated with a new IP address from the same subnet or
another subnet on the same network. The port update will result in
anti-spoofing iptables rule changes that immediately stop the old IP
address from working on the host. This means the host is unreachable for
0-12 hours based on the current default lease time without manual
intervention[2] (assuming half-lease length DHCP renewal attempts).

Why is this on the mailing list?

In an attempt to make the VMs usable in a much shorter timeframe following
a Neutron port address change, I submitted a patch to reduce the default
DHCP lease time to 8 minutes.[3] However, this was upsetting to several
people,[4] so it was suggested I bring this discussion to the mailing list.
The following are the high-level concerns followed by my responses:

   - 8 minutes is arbitrary
      - Yes, but it's no more arbitrary than 1440 minutes. I picked it as
      an interval because it is still 4 times larger than the last short value,
      but it still allows VMs to regain connectivity in <5 minutes in the event
      their IP is changed. If someone has a good suggestion for
another interval
      based on known dnsmasq QPS limits or some other quantitative
reason, please
      chime in here.
   - other datacenters use long lease times
      - This is true, but it's not really a valid comparison. In most
      regular datacenters, updating a static DHCP lease has no effect
on the data
      plane so it doesn't matter that the client doesn't react for hours/days
      (even with DHCP snooping enabled). However, in Neutron's case,
the security
      groups are immediately updated so all traffic using the old address is
      blocked.
   - dhcp traffic is scary because it's broadcast
      - ARP traffic is also broadcast and many clients will expire entries
      every 5-10 minutes and re-ARP. L2population may be used to prevent ARP
      propagation, so the comparison between DHCP and ARP isn't always relevant
      here.


Please reply back with your opinions/anecdotes/data related to short DHCP
lease times.

Cheers

1.
https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe
2. Manual intervention could be an instance reboot, a dhcp client
invocation via the console, or a delayed invocation right before the
update. (all significantly more difficult to script than a simple update of
a port's IP via the API).
3. https://review.openstack.org/#/c/150595/
4. http://i.imgur.com/xtvatkP.jpg

-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150128/7e3ee674/attachment.html>


More information about the OpenStack-dev mailing list