[Openstack-operators] [openstack-dev] [openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation

Neil Jerram Neil.Jerram at metaswitch.com
Tue Jun 9 10:34:04 UTC 2015


On 09/06/15 01:15, Kevin Benton wrote:
> I'm having difficulty reproducing the issue. The bug that Neil
> referenced (https://bugs.launchpad.net/neutron/+bug/1192381) looks like
> it was in Icehouse well before the 2014.1.3 release that looks like Fuel
> 5.1.1 is using.

Just to be sure, I assume we're focussing here on the issue that Daniel 
reported (IP appears twice in Dnsmasq config), and for which I described 
a possible corollary (Dnsmasq config size keeps growing), and NOT on the 
"Another DHCP agent problem" that I mentioned below. :-)

BTW, now that I've reviewed the history of when my team saw this, I can 
say that it was actually first reported to us with the 'IP appears twice 
in Dnsmasq config' symptom - i.e. exactly the same as Daniel's case. 
The fact of the Dnsmasq config increasing in size was noticed later.

> I tried setting the agent report interval to something higher than the
> downtime to make it seem like the agent is failing sporadically to the
> server, but it's not impacting the notifications.

Makes sense - that's the effect of the fix for 1192381.

To be clear, though, what code are you trying to reproduce on?  Current 
master?

> Neil, does your testing where you saw something similar have a lot of
> concurrent creation/deletion?

It was a test of continuously deleting and creating VMs, with this 
pseudocode:

thread_pool = new_thread_pool(size=30)
for x in range(0,30):
     thread_pool.submit(create_vm)
thread_pool.wait_for_all_threads_to_complete()
while True:
      time.sleep(5)
      for x in range(0,int(random.random()*5)):
           thread_pool.submit(randomly_delete_a_vm_and_create_a_new_one)

I'm not clear whether that would qualify as 'concurrent', in the sense 
that you have in mind.

Regards,
	Neil

> On Mon, Jun 8, 2015 at 12:21 PM, Andrew Woodward <awoodward at mirantis.com
> <mailto:awoodward at mirantis.com>> wrote:
>
>     Daniel,
>
>     This sounds familiar, see if this matches [1]. IIRC, there was
>     another issue like this that was might already address this in the
>     updates into Fuel 5.1.2 packages repo [2]. You can either update the
>     neutron packages from [2] Or try one of community builds for 5.1.2
>     [3]. If this doesn't resolve the issue, open a bug against MOS dev [4].
>
>     [1] https://bugs.launchpad.net/bugs/1295715
>     [2] http://fuel-repository.mirantis.com/fwm/5.1.2/ubuntu/pool/main/
>     [3] https://ci.fuel-infra.org/
>     [4] https://bugs.launchpad.net/mos/+filebug
>
>     On Mon, Jun 8, 2015 at 10:15 AM Neil Jerram
>     <Neil.Jerram at metaswitch.com <mailto:Neil.Jerram at metaswitch.com>> wrote:
>
>         Two further thoughts on this:
>
>         1. Another DHCP agent problem that my team noticed is that it
>         call_driver('reload_allocations') takes a bit of time (to
>         regenerate the
>         Dnsmasq config files, and to spawn a shell that sends a HUP
>         signal) -
>         enough so that if there is a fast steady rate of port-create and
>         port-delete notifications coming from the Neutron server, these can
>         build up in DHCPAgent's RPC queue, and then they still only get
>         dispatched one at a time.  So the queue and the time delay
>         become longer
>         and longer.
>
>         I have a fix pending for this, which uses an extra thread to
>         read those
>         notifications off the RPC queue onto an internal queue, and then
>         batches
>         the call_driver('reload_allocations') processing when there is a
>         contiguous sequence of such notifications - i.e. only does the
>         config
>         regeneration and HUP once, instead of lots of times.
>
>         I don't think this is directly related to what you are seeing - but
>         perhaps there actually is some link that I am missing.
>
>         2. There is an interesting and vaguely similar thread currently
>         being
>         discussed about the L3 agent (subject "L3 agent rescheduling
>         issue") -
>         about possible RPC/threading issues between the agent and the
>         Neutron
>         server.  You might like to review that thread and see if it
>         describes
>         any problems analogous to your DHCP one.
>
>         Regards,
>                  Neil
>
>
>         On 08/06/15 17:53, Neil Jerram wrote:
>          > My team has seen a problem that could be related: in a churn
>         test where
>          > VMs are created and terminated at a constant rate - but so
>         that the
>          > number of active VMs should remain roughly constant - the
>         size of the
>          > host and addn_hosts files keeps increasing.
>          >
>          > In other words, it appears that the config for VMs that have
>         actually
>          > been terminated is not being removed from the config file.
>         Clearly, if
>          > you have a limited pool of IP addresses, this can eventually
>         lead to the
>          > problem that you have described.
>          >
>          > For your case - i.e. with Icehouse - the problem might be
>          > https://bugs.launchpad.net/neutron/+bug/1192381.  I'm not
>         sure if the
>          > fix for that problem - i.e. sending port-create and port-delete
>          > notifications to DHCP agents even when the server thinks they
>         are down -
>          > was merged before the Icehouse release, or not.
>          >
>          > But there must be at least one other cause as well, because
>         my team was
>          > seeing this with Juno-level code.
>          >
>          > Therefore I, too, would be interested in any other insights
>         about this
>          > problem.
>          >
>          > Regards,
>          >      Neil
>          >
>          >
>          >
>          > On 08/06/15 16:26, Daniel Comnea wrote:
>          >> Any help, ideas please?
>          >>
>          >> Thx,
>          >> Dani
>          >>
>          >> On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea
>         <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
>          >> <mailto:comnea.dani at gmail.com
>         <mailto:comnea.dani at gmail.com>>> wrote:
>          >>
>          >>     + Operators
>          >>
>          >>     Much thanks in advance,
>          >>     Dani
>          >>
>          >>
>          >>
>          >>
>          >>     On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea
>         <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
>          >>     <mailto:comnea.dani at gmail.com
>         <mailto:comnea.dani at gmail.com>>> wrote:
>          >>
>          >>         Hi all,
>          >>
>          >>         I'm running IceHouse (build using Fuel 5.1.1) on
>         Ubuntu where
>          >>         dnsmask version 2.59-4.
>          >>         I have a very basic network layout where i have a
>         private net
>          >>         which has 2 subnets
>          >>
>          >>           2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net
>          >>                                     |
>          >>         e79c3477-d3e5-471c-a728-8d881cf31bee
>         192.168.110.0/24 <http://192.168.110.0/24>
>          >>         <http://192.168.110.0/24> |
>          >>         |
>          >>         |
>               |
>          >>         f48c3223-8507-455c-9c13-8b727ea5f441
>         192.168.111.0/24 <http://192.168.111.0/24>
>          >>         <http://192.168.111.0/24> |
>          >>
>          >>         and i'm creating VMs via HEAT.
>          >>         What is happening is that sometimes i get duplicated
>         entries in
>          >>         [1] and because of that the VM which was spun up
>         doesn't get
>          >> an ip.
>          >>         The Dnsmask processes are running okay [2] and i
>         can't see
>          >>         anything special/ wrong in it.
>          >>
>          >>         Any idea why this is happening? Or are you aware of
>         any bugs
>          >>         around this area? Do you see a problems with having
>         2 subnets
>          >>         mapped to 1 private-net?
>          >>
>          >>
>          >>
>          >>         Thanks,
>          >>         Dani
>          >>
>          >>         [1]
>          >>
>          >>
>         /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
>          >>
>          >>         [2]
>          >>
>          >>         nobody    5664     1  0 Jun02 ?        00:00:08 dnsmasq
>          >>         --no-hosts --no-resolv --strict-order --bind-interfaces
>          >>         --interface=tapc9164734-0c --except-interface=lo
>          >>
>          >>
>         --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid
>          >>
>          >>
>         --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host
>          >>
>          >>
>          >>
>         --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
>          >>
>          >>
>          >>
>         --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts
>          >>
>          >>         --leasefile-ro --dhcp-authoritative
>          >>         --dhcp-range=set:tag0,192.168.110.0,static,86400s
>          >>         --dhcp-range=set:tag1,192.168.111.0,static,86400s
>          >>         --dhcp-lease-max=512 --conf-file= --server=10.0.0.31
>          >>         --server=10.0.0.32 --domain=openstacklocal
>          >>
>          >>
>          >>
>          >>
>          >>
>          >> _______________________________________________
>          >> OpenStack-operators mailing list
>          >> OpenStack-operators at lists.openstack.org
>         <mailto:OpenStack-operators at lists.openstack.org>
>          >>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>          >>
>          >
>          > _______________________________________________
>          > OpenStack-operators mailing list
>          > OpenStack-operators at lists.openstack.org
>         <mailto:OpenStack-operators at lists.openstack.org>
>          >
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>         __________________________________________________________________________
>         OpenStack Development Mailing List (not for usage questions)
>         Unsubscribe:
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>     --
>     --
>     Andrew Woodward
>     Mirantis
>     Fuel Community Ambassador
>     Ceph Community
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Kevin Benton
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



More information about the OpenStack-operators mailing list