[openstack-dev] [Openstack-operators] [openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation

Kevin Benton blak111 at gmail.com
Tue Jun 9 11:36:45 UTC 2015


>Just to be sure, I assume we're focussing here on the issue that Daniel
reported

Yes.

>To be clear, though, what code are you trying to reproduce on?  Current
master?

I was trying on 2014.1.3, which is the version I understand to be on Fuel
5.1.1.

>I'm not clear whether that would qualify as 'concurrent', in the sense
that you have in mind.

It doesn't look like it based on the pseudocode. I was thinking of a
condition where a port is deleted nearly very quickly after it was created.
Is that possible with your test? If not, then my theory about out-of-order
notifications might not be any good.

On Tue, Jun 9, 2015 at 3:34 AM, Neil Jerram <Neil.Jerram at metaswitch.com>
wrote:

> On 09/06/15 01:15, Kevin Benton wrote:
>
>> I'm having difficulty reproducing the issue. The bug that Neil
>> referenced (https://bugs.launchpad.net/neutron/+bug/1192381) looks like
>> it was in Icehouse well before the 2014.1.3 release that looks like Fuel
>> 5.1.1 is using.
>>
>
> Just to be sure, I assume we're focussing here on the issue that Daniel
> reported (IP appears twice in Dnsmasq config), and for which I described a
> possible corollary (Dnsmasq config size keeps growing), and NOT on the
> "Another DHCP agent problem" that I mentioned below. :-)
>
> BTW, now that I've reviewed the history of when my team saw this, I can
> say that it was actually first reported to us with the 'IP appears twice in
> Dnsmasq config' symptom - i.e. exactly the same as Daniel's case. The fact
> of the Dnsmasq config increasing in size was noticed later.
>
>  I tried setting the agent report interval to something higher than the
>> downtime to make it seem like the agent is failing sporadically to the
>> server, but it's not impacting the notifications.
>>
>
> Makes sense - that's the effect of the fix for 1192381.
>
> To be clear, though, what code are you trying to reproduce on?  Current
> master?
>
>  Neil, does your testing where you saw something similar have a lot of
>> concurrent creation/deletion?
>>
>
> It was a test of continuously deleting and creating VMs, with this
> pseudocode:
>
> thread_pool = new_thread_pool(size=30)
> for x in range(0,30):
>     thread_pool.submit(create_vm)
> thread_pool.wait_for_all_threads_to_complete()
> while True:
>      time.sleep(5)
>      for x in range(0,int(random.random()*5)):
>           thread_pool.submit(randomly_delete_a_vm_and_create_a_new_one)
>
> I'm not clear whether that would qualify as 'concurrent', in the sense
> that you have in mind.
>
> Regards,
>         Neil
>
>  On Mon, Jun 8, 2015 at 12:21 PM, Andrew Woodward <awoodward at mirantis.com
>> <mailto:awoodward at mirantis.com>> wrote:
>>
>>     Daniel,
>>
>>     This sounds familiar, see if this matches [1]. IIRC, there was
>>     another issue like this that was might already address this in the
>>     updates into Fuel 5.1.2 packages repo [2]. You can either update the
>>     neutron packages from [2] Or try one of community builds for 5.1.2
>>     [3]. If this doesn't resolve the issue, open a bug against MOS dev
>> [4].
>>
>>     [1] https://bugs.launchpad.net/bugs/1295715
>>     [2] http://fuel-repository.mirantis.com/fwm/5.1.2/ubuntu/pool/main/
>>     [3] https://ci.fuel-infra.org/
>>     [4] https://bugs.launchpad.net/mos/+filebug
>>
>>     On Mon, Jun 8, 2015 at 10:15 AM Neil Jerram
>>     <Neil.Jerram at metaswitch.com <mailto:Neil.Jerram at metaswitch.com>>
>> wrote:
>>
>>         Two further thoughts on this:
>>
>>         1. Another DHCP agent problem that my team noticed is that it
>>         call_driver('reload_allocations') takes a bit of time (to
>>         regenerate the
>>         Dnsmasq config files, and to spawn a shell that sends a HUP
>>         signal) -
>>         enough so that if there is a fast steady rate of port-create and
>>         port-delete notifications coming from the Neutron server, these
>> can
>>         build up in DHCPAgent's RPC queue, and then they still only get
>>         dispatched one at a time.  So the queue and the time delay
>>         become longer
>>         and longer.
>>
>>         I have a fix pending for this, which uses an extra thread to
>>         read those
>>         notifications off the RPC queue onto an internal queue, and then
>>         batches
>>         the call_driver('reload_allocations') processing when there is a
>>         contiguous sequence of such notifications - i.e. only does the
>>         config
>>         regeneration and HUP once, instead of lots of times.
>>
>>         I don't think this is directly related to what you are seeing -
>> but
>>         perhaps there actually is some link that I am missing.
>>
>>         2. There is an interesting and vaguely similar thread currently
>>         being
>>         discussed about the L3 agent (subject "L3 agent rescheduling
>>         issue") -
>>         about possible RPC/threading issues between the agent and the
>>         Neutron
>>         server.  You might like to review that thread and see if it
>>         describes
>>         any problems analogous to your DHCP one.
>>
>>         Regards,
>>                  Neil
>>
>>
>>         On 08/06/15 17:53, Neil Jerram wrote:
>>          > My team has seen a problem that could be related: in a churn
>>         test where
>>          > VMs are created and terminated at a constant rate - but so
>>         that the
>>          > number of active VMs should remain roughly constant - the
>>         size of the
>>          > host and addn_hosts files keeps increasing.
>>          >
>>          > In other words, it appears that the config for VMs that have
>>         actually
>>          > been terminated is not being removed from the config file.
>>         Clearly, if
>>          > you have a limited pool of IP addresses, this can eventually
>>         lead to the
>>          > problem that you have described.
>>          >
>>          > For your case - i.e. with Icehouse - the problem might be
>>          > https://bugs.launchpad.net/neutron/+bug/1192381.  I'm not
>>         sure if the
>>          > fix for that problem - i.e. sending port-create and port-delete
>>          > notifications to DHCP agents even when the server thinks they
>>         are down -
>>          > was merged before the Icehouse release, or not.
>>          >
>>          > But there must be at least one other cause as well, because
>>         my team was
>>          > seeing this with Juno-level code.
>>          >
>>          > Therefore I, too, would be interested in any other insights
>>         about this
>>          > problem.
>>          >
>>          > Regards,
>>          >      Neil
>>          >
>>          >
>>          >
>>          > On 08/06/15 16:26, Daniel Comnea wrote:
>>          >> Any help, ideas please?
>>          >>
>>          >> Thx,
>>          >> Dani
>>          >>
>>          >> On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea
>>         <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
>>          >> <mailto:comnea.dani at gmail.com
>>         <mailto:comnea.dani at gmail.com>>> wrote:
>>          >>
>>          >>     + Operators
>>          >>
>>          >>     Much thanks in advance,
>>          >>     Dani
>>          >>
>>          >>
>>          >>
>>          >>
>>          >>     On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea
>>         <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
>>          >>     <mailto:comnea.dani at gmail.com
>>
>>         <mailto:comnea.dani at gmail.com>>> wrote:
>>          >>
>>          >>         Hi all,
>>          >>
>>          >>         I'm running IceHouse (build using Fuel 5.1.1) on
>>         Ubuntu where
>>          >>         dnsmask version 2.59-4.
>>          >>         I have a very basic network layout where i have a
>>         private net
>>          >>         which has 2 subnets
>>          >>
>>          >>           2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net
>>          >>                                     |
>>          >>         e79c3477-d3e5-471c-a728-8d881cf31bee
>>         192.168.110.0/24 <http://192.168.110.0/24>
>>          >>         <http://192.168.110.0/24> |
>>          >>         |
>>          >>         |
>>               |
>>          >>         f48c3223-8507-455c-9c13-8b727ea5f441
>>         192.168.111.0/24 <http://192.168.111.0/24>
>>          >>         <http://192.168.111.0/24> |
>>          >>
>>          >>         and i'm creating VMs via HEAT.
>>          >>         What is happening is that sometimes i get duplicated
>>         entries in
>>          >>         [1] and because of that the VM which was spun up
>>         doesn't get
>>          >> an ip.
>>          >>         The Dnsmask processes are running okay [2] and i
>>         can't see
>>          >>         anything special/ wrong in it.
>>          >>
>>          >>         Any idea why this is happening? Or are you aware of
>>         any bugs
>>          >>         around this area? Do you see a problems with having
>>         2 subnets
>>          >>         mapped to 1 private-net?
>>          >>
>>          >>
>>          >>
>>          >>         Thanks,
>>          >>         Dani
>>          >>
>>          >>         [1]
>>          >>
>>          >>
>>
>> /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
>>          >>
>>          >>         [2]
>>          >>
>>          >>         nobody    5664     1  0 Jun02 ?        00:00:08
>> dnsmasq
>>          >>         --no-hosts --no-resolv --strict-order
>> --bind-interfaces
>>          >>         --interface=tapc9164734-0c --except-interface=lo
>>          >>
>>          >>
>>
>> --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid
>>          >>
>>          >>
>>
>> --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host
>>          >>
>>          >>
>>          >>
>>
>> --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
>>          >>
>>          >>
>>          >>
>>
>> --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts
>>          >>
>>          >>         --leasefile-ro --dhcp-authoritative
>>          >>         --dhcp-range=set:tag0,192.168.110.0,static,86400s
>>          >>         --dhcp-range=set:tag1,192.168.111.0,static,86400s
>>          >>         --dhcp-lease-max=512 --conf-file= --server=10.0.0.31
>>          >>         --server=10.0.0.32 --domain=openstacklocal
>>          >>
>>          >>
>>          >>
>>          >>
>>          >>
>>          >> _______________________________________________
>>          >> OpenStack-operators mailing list
>>          >> OpenStack-operators at lists.openstack.org
>>         <mailto:OpenStack-operators at lists.openstack.org>
>>          >>
>>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>          >>
>>          >
>>          > _______________________________________________
>>          > OpenStack-operators mailing list
>>          > OpenStack-operators at lists.openstack.org
>>         <mailto:OpenStack-operators at lists.openstack.org>
>>          >
>>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>> __________________________________________________________________________
>>         OpenStack Development Mailing List (not for usage questions)
>>         Unsubscribe:
>>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>         <
>> http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>     --
>>     --
>>     Andrew Woodward
>>     Mirantis
>>     Fuel Community Ambassador
>>     Ceph Community
>>
>>
>> __________________________________________________________________________
>>     OpenStack Development Mailing List (not for usage questions)
>>     Unsubscribe:
>>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>> >
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>> --
>> Kevin Benton
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>


-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150609/5bf701de/attachment.html>


More information about the OpenStack-dev mailing list