[openstack-dev] [neutron] - dnsmasq 'dhcp-authoritative' option broke multiple DHCP servers

Kevin Benton blak111 at gmail.com
Wed May 27 05:44:46 UTC 2015


I have verified that the following fixes the issue for me locally:
https://review.openstack.org/#/c/185486/
This works for rescheduled DHCP instances, multiple DHCP instances, and
restarted DHCP instances.

I suspect that this is the cleanest thing to back-port because it doesn't
add any translatables, scripts, rootwrap changes, or dependencies.

For more background, Brian brought this up the dnsmasq discussion email
list and it seems like the DHCP client used by Cirros (udhcpc) honors the
NAKs while other clients do not.[1] Apparently that client is being 'fixed'
to ignore NAK's from other servers, which should effectively defeat the
entire point of 'authoritative' DHCP servers. :)
However, we still need to fix this on our side since we can't tell people
to just change their DHCP client.


1.
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2015q2/009570.html

On Tue, May 26, 2015 at 3:02 PM, Kevin Benton <blak111 at gmail.com> wrote:

> >As long as we confirm that the severity of this bug is accurately represented
> in the bug report, then this is the first thing we should do.  However,
> see below.  We tried this and did not encounter the error in at least one
> experiment.  Are we sure that this is broken everywhere multiple servers
> is used?  I'm checking internally to confirm that we have run this
> successfully.
>
> Outside of the reported bug, I had another person report this behavior to
> me from Big Switch Networks as well. Additionally, I was just informed
> today that it was encountered internally here at Mirantis testing the
> latest stable/juno code.
>
> On Tue, May 26, 2015 at 12:37 PM, Carl Baldwin <carl at ecbaldwin.net> wrote:
>
>> On Tue, May 26, 2015 at 11:05 AM, Brian Haley <brian.haley at hp.com> wrote:
>> > On 05/26/2015 01:12 PM, Salvatore Orlando wrote:
>> >>
>> >>  From the bug Kevin reported it seems multiple dhcp agents per network
>> >> have been
>> >> completely broken by the fix for bug #1345947, so a revert of patch [1]
>> >> (and
>> >> stable backports) should probably be the first thing to do - if nothing
>> >> else
>> >> because the original bug has not nearly the same level of severity of
>> the
>> >> one it
>> >> introduced.
>>
>> As long as we confirm that the severity of this bug is accurately
>> represented in the bug report, then this is the first thing we should
>> do.  However, see below.  We tried this and did not encounter the
>> error in at least one experiment.  Are we sure that this is broken
>> everywhere multiple servers is used?  I'm checking internally to
>> confirm that we have run this successfully.
>>
>> >> Before doing this however, I am wondering why the various instances of
>> >> dnsmasq
>> >> end up returning NAKs. I expect all instances to have the same hosts
>> file,
>> >> so
>> >> they should be able to respond to DHCPDISCOVER/DHCPREQUEST correctly.
>> Is
>> >> the
>> >> dnsmasq log telling us exactly why the authoritative setting is
>> preventing
>> >> us
>> >> from doing so? (this is more of a curiosity in my side)
>> >>
>> >> [1] https://review.openstack.org/#/c/152080/
>>
>> I also think we should understand more about this problem.  I think
>> that understanding more specifics around the bug will help.  The
>> details are a bit unclear to me.
>>
>> > In the original case, the DHCPREQUEST is for a renew, which is different
>> > than for an initial request.  If the server does not have a lease entry
>> > (which it won't after a restart), then it will NAK, which normally just
>> > causes the client to retry at INIT state.
>> >
>> > I had asked on the dnsmasq list about this [1], and the multiple server
>> > question was the wildcard, my testing didn't see the error described in
>> the
>> > new bug though.  I guess the first proposed fix of re-populating the
>> lease
>> > information doesn't seem like such a bad idea any more, but I will
>> reply to
>> > my original query with the tcpdump information since I'm confused as to
>> why
>> > the second dhcp agent stepped-in with a NAK at all after originally
>> offering
>> > the same address as the first dhcp agent [2].
>>
>> I remember being concerned about the multiple dnsmasq case.  I also
>> remember having tried it and thought that it was working as expected.
>>
>> > I would agree the best thing to do is revert the stable backports while
>> we
>> > work on fixing this in the master branch.
>>
>> I think we can propose the reverts but until we confirm the severity
>> of this bug, I don't want them to merge.
>>
>> Carl
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Kevin Benton
>



-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150526/150c07dc/attachment.html>


More information about the OpenStack-dev mailing list