[openstack-dev] [neutron] high dhcp lease times in neutron deployments considered harmful (or not???)
Kevin Benton
blak111 at gmail.com
Thu Jan 29 08:55:31 UTC 2015
>Why would users want to change an active port's IP address anyway?
Re-addressing. It's not common, but the entire reason I brought this up is
because a user was moving an instance to another subnet on the same network
and stranded one of their VMs.
> I worry about setting a default config value to handle a very unusual use
case.
Changing a static lease is something that works on normal networks so I
don't think we should break it in Neutron without a really good reason.
Right now, the big reason to keep a high lease time that I agree with is
that it buys operators lots of dnsmasq downtime without affecting running
clients. To get the best of both worlds we can set DHCP option 58 (a.k.a
dhcp-renewal-time or T1) to 240 seconds. Then the lease time can be left to
be something large like 10 days to allow for tons of DHCP server downtime
without affecting running clients.
There are two issues with this approach. First, some simple dhcp clients
don't honor that dhcp option (e.g. the one with Cirros), but it works with
dhclient so it should work on CentOS, Fedora, etc (I verified it works on
Ubuntu). This isn't a big deal because the worst case is what we have
already (half of the lease time). The second issue is that dnsmasq
hardcodes that option, so a patch would be required to allow it to be
specified in the options file. I am happy to submit the patch required
there so that isn't a big deal either.
If we implement that fix, the remaining issue is Brian's other comment
about too much DHCP traffic. I've been doing some packet captures and the
standard request/reply for a renewal is 2 unicast packets totaling about
725 bytes. Assuming 10,000 VMs renewing every 240 seconds, there will be an
average of 242 kbps background traffic across the entire network. Even at a
density of 50 VMs, that's only 1.2 kbps per compute node. If that's still
too much, then the deployer can adjust the value upwards, but that's hardly
a reason to have a high default.
That just leaves the logging problem. Since we require a change to dnsmasq
anyway, perhaps we could also request an option to suppress logs from
renewals? If that's not adequate, I think 2 log entries per vm every 240
seconds is really only a concern for operators with large clouds and they
should have the knowledge required to change a config file anyway. ;-)
On Wed, Jan 28, 2015 at 3:59 PM, Chuck Carlino <chuckjcarlino at gmail.com>
wrote:
> On 01/28/2015 12:51 PM, Kevin Benton wrote:
>
> If we are going to ignore the IP address changing use-case, can we just
> make the default infinity? Then nobody ever has to worry about control
> plane outages for existing client. 24 hours is way too long to be useful
> anyway.
>
>
> Why would users want to change an active port's IP address anyway? I can
> see possible use in changing an inactive port's IP address, but that
> wouldn't cause the dhcp issues mentioned here. I worry about setting a
> default config value to handle a very unusual use case.
>
> Chuck
>
>
>
> On Jan 28, 2015 12:44 PM, "Salvatore Orlando" <sorlando at nicira.com>
> wrote:
>
>>
>>
>> On 28 January 2015 at 20:19, Brian Haley <brian.haley at hp.com> wrote:
>>
>>> Hi Kevin,
>>>
>>> On 01/28/2015 03:50 AM, Kevin Benton wrote:
>>> > Hi,
>>> >
>>> > Approximately a year and a half ago, the default DHCP lease time in
>>> Neutron was
>>> > increased from 120 seconds to 86400 seconds.[1] This was done with the
>>> goal of
>>> > reducing DHCP traffic with very little discussion (based on what I can
>>> see in
>>> > the review and bug report). While it it does indeed reduce DHCP
>>> traffic, I don't
>>> > think any bug reports were filed showing that a 120 second lease time
>>> resulted
>>> > in too much traffic or that a jump all of the way to 86400 seconds was
>>> required
>>> > instead of a value in the same order of magnitude.
>>> >
>>> > Why does this matter?
>>> >
>>> > Neutron ports can be updated with a new IP address from the same
>>> subnet or
>>> > another subnet on the same network. The port update will result in
>>> anti-spoofing
>>> > iptables rule changes that immediately stop the old IP address from
>>> working on
>>> > the host. This means the host is unreachable for 0-12 hours based on
>>> the current
>>> > default lease time without manual intervention[2] (assuming half-lease
>>> length
>>> > DHCP renewal attempts).
>>>
>>> So I'll first comment on the problem. You're essentially "pulling the
>>> rug" out
>>> from under these VMs by changing their IP (and that of their router and
>>> DHCP/DNS
>>> server), but you expect they should fail quickly and come right back
>>> online. In
>>> a non-Neutron environment wouldn't the IT person that did this need some
>>> pretty
>>> good heat-resistant pants for all the flames from pissed-off users?
>>> Sure, the
>>> guy on his laptop will just bounce the connection, but servers (aka VMs)
>>> should
>>> stay pretty static. VMs are servers (and cows according to some).
>>>
>>
>> I actually expect this kind operation to not be one Neutron users will
>> do very often, mostly because regardless of whether you're in the cloud or
>> not, you'd still need to wear those heat resistant pants.
>>
>>
>>>
>>> The correct solution is to be able to renumber the network so there is
>>> no issue
>>> with the anti-spoofing rules dropping packets, or the VMs having an
>>> unreachable
>>> IP address, but that's a much bigger nut to crack.
>>>
>>
>> Indeed. In my opinion the "update IP" operation sets false expectations
>> in users. I have considered disallowing PUT on fixed_ips in the past but
>> that did not go ahead because there were users leveraging it.
>>
>>
>>>
>>> > Why is this on the mailing list?
>>> >
>>> > In an attempt to make the VMs usable in a much shorter timeframe
>>> following a
>>> > Neutron port address change, I submitted a patch to reduce the default
>>> DHCP
>>> > lease time to 8 minutes.[3] However, this was upsetting to several
>>> people,[4] so
>>> > it was suggested I bring this discussion to the mailing list. The
>>> following are
>>> > the high-level concerns followed by my responses:
>>> >
>>> > * 8 minutes is arbitrary
>>> > o Yes, but it's no more arbitrary than 1440 minutes. I picked it
>>> as an
>>> > interval because it is still 4 times larger than the last
>>> short value,
>>> > but it still allows VMs to regain connectivity in <5 minutes
>>> in the
>>> > event their IP is changed. If someone has a good suggestion
>>> for another
>>> > interval based on known dnsmasq QPS limits or some other
>>> quantitative
>>> > reason, please chime in here.
>>>
>>> We run 48 hours as the default in our public cloud, and I did some
>>> digging to
>>> remind myself of the multiple reasons:
>>>
>>> 1. Too much DHCP traffic. Sure, only that initial request is broadcast,
>>> but
>>> dnsmasq is very verbose and loves writing to syslog for everything it
>>> does -
>>> less is more. Do a scale test with 10K VMs and you'll quickly find out
>>> a large
>>> portion of traffic is DHCP RENEWs, and syslog is huge.
>>>
>>
>> This is correct, and something I overlooked in my previous post.
>> Nevertheless I still think that it is really impossible to find an optimal
>> default which is regarded as such by every user. The current default has
>> been chosen mostly for the reason you explain below, and I don't see a
>> strong reason for changing it.
>>
>>
>>>
>>> 2. During a control-plane upgrade or outage, having a short DHCP lease
>>> time will
>>> take all your VMs offline. The old value of 2 minutes is not a
>>> realistic value
>>> for an upgrade, and I don't think 8 minutes is much better. Yes, when
>>> DHCP is
>>> down you can't boot a new VM, but as long as customers can get to their
>>> existing
>>> VMs they're pretty happy and won't scream bloody murder.
>>>
>>
>> In our cloud we were continuously hit bit this. We could not take our
>> dhcp agents out, otherwise all VMs would lose their leases, unless the
>> downtime of the agent was very brief.
>>
>>
>>> There's probably more, but those were the top two, with #2 being most
>>> important.
>>>
>>
>> Summarizing, I think that Kevin is exposing a real, albeit well-know
>> problem (sorry about my dhcp release faux pas - I can use jet lag as a
>> justification!), and he's proposing a mitigation to it. On the other hand,
>> this mitigation, as Brian explains, is going to cause real operational
>> issues. Still, we're arguing on the a default value for a configuration
>> parameter. I therefore think the best thing that we can do is explicitly
>> stating what happens when setting long or short lease times.
>> I expected this to be documented in [1], but it's not. I think that place
>> and neutron.conf might contain this kind of documentation, such as:
>>
>> # DHCP Lease duration (in seconds).
>> # Use -1 to tell dnsmasq to use infinite lease times.
>> # dhcp_lease_duration = 86400
>> # Note that long DHCP leases will result in delays
>> # in instances acquiring updated IP addresses. This
>> # may result in downtime for those instance as anti
>> # spoof policy will then block all traffic in and out of
>> # them. In order to minimise this downtime window
>> # the lease time should be shorter, for example
>> # dhcp_lease_duration = 480
>>
>> However, I would not change the current system default, as this might
>> affect operational systems.
>>
>> Apologies again for my stupid dhcp-release note,
>> Salvatore
>>
>> [1] http://developer.openstack.org/api-ref-networking-v2.html
>>
>>
>>>
>>> > * other datacenters use long lease times
>>> > o This is true, but it's not really a valid comparison. In most
>>> regular
>>> > datacenters, updating a static DHCP lease has no effect on the
>>> data
>>> > plane so it doesn't matter that the client doesn't react for
>>> hours/days
>>> > (even with DHCP snooping enabled). However, in Neutron's case,
>>> the
>>> > security groups are immediately updated so all traffic using
>>> the old
>>> > address is blocked.
>>>
>>> Yes, and choosing the lease time is a deployment decision that needs to
>>> take a
>>> lot of things into account. Like I said, we don't even use the
>>> default. The
>>> default should just be a good guess for a standard deployment, not a
>>> value that
>>> caters towards the edge cases, especially when the value is tunable in
>>> neutron.conf.
>>>
>>> > * dhcp traffic is scary because it's broadcast
>>> > o ARP traffic is also broadcast and many clients will expire
>>> entries every
>>> > 5-10 minutes and re-ARP. L2population may be used to prevent
>>> ARP
>>> > propagation, so the comparison between DHCP and ARP isn't
>>> always
>>> > relevant here.
>>>
>>> I don't recall anyone being scared of broadcast, and can't find any
>>> comments
>>> regarding it in https://review.openstack.org/#/c/150595/
>>>
>>> > Please reply back with your opinions/anecdotes/data related to short
>>> DHCP lease
>>> > times.
>>>
>>> I can only speculate on why 24 hours was chosen as the default back in
>>> 2013,
>>> possibly because a lot of wireless router firmware defaults are set as
>>> such?
>>>
>>> > 1.
>>> https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe
>>> > 2. Manual intervention could be an instance reboot, a dhcp client
>>> invocation via
>>> > the console, or a delayed invocation right before the update. (all
>>> significantly
>>> > more difficult to script than a simple update of a port's IP via the
>>> API).
>>> > 3. https://review.openstack.org/#/c/150595/
>>> > 4. http://i.imgur.com/xtvatkP.jpg
>>>
>>> I was a much bigger baby than that :)
>>>
>>> -Brian
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
--
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150129/e32f60c7/attachment.html>
More information about the OpenStack-dev
mailing list