[Openstack-operators] [openstack-dev] [openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation
Neil Jerram
Neil.Jerram at metaswitch.com
Tue Jun 9 10:34:04 UTC 2015
On 09/06/15 01:15, Kevin Benton wrote:
> I'm having difficulty reproducing the issue. The bug that Neil
> referenced (https://bugs.launchpad.net/neutron/+bug/1192381) looks like
> it was in Icehouse well before the 2014.1.3 release that looks like Fuel
> 5.1.1 is using.
Just to be sure, I assume we're focussing here on the issue that Daniel
reported (IP appears twice in Dnsmasq config), and for which I described
a possible corollary (Dnsmasq config size keeps growing), and NOT on the
"Another DHCP agent problem" that I mentioned below. :-)
BTW, now that I've reviewed the history of when my team saw this, I can
say that it was actually first reported to us with the 'IP appears twice
in Dnsmasq config' symptom - i.e. exactly the same as Daniel's case.
The fact of the Dnsmasq config increasing in size was noticed later.
> I tried setting the agent report interval to something higher than the
> downtime to make it seem like the agent is failing sporadically to the
> server, but it's not impacting the notifications.
Makes sense - that's the effect of the fix for 1192381.
To be clear, though, what code are you trying to reproduce on? Current
master?
> Neil, does your testing where you saw something similar have a lot of
> concurrent creation/deletion?
It was a test of continuously deleting and creating VMs, with this
pseudocode:
thread_pool = new_thread_pool(size=30)
for x in range(0,30):
thread_pool.submit(create_vm)
thread_pool.wait_for_all_threads_to_complete()
while True:
time.sleep(5)
for x in range(0,int(random.random()*5)):
thread_pool.submit(randomly_delete_a_vm_and_create_a_new_one)
I'm not clear whether that would qualify as 'concurrent', in the sense
that you have in mind.
Regards,
Neil
> On Mon, Jun 8, 2015 at 12:21 PM, Andrew Woodward <awoodward at mirantis.com
> <mailto:awoodward at mirantis.com>> wrote:
>
> Daniel,
>
> This sounds familiar, see if this matches [1]. IIRC, there was
> another issue like this that was might already address this in the
> updates into Fuel 5.1.2 packages repo [2]. You can either update the
> neutron packages from [2] Or try one of community builds for 5.1.2
> [3]. If this doesn't resolve the issue, open a bug against MOS dev [4].
>
> [1] https://bugs.launchpad.net/bugs/1295715
> [2] http://fuel-repository.mirantis.com/fwm/5.1.2/ubuntu/pool/main/
> [3] https://ci.fuel-infra.org/
> [4] https://bugs.launchpad.net/mos/+filebug
>
> On Mon, Jun 8, 2015 at 10:15 AM Neil Jerram
> <Neil.Jerram at metaswitch.com <mailto:Neil.Jerram at metaswitch.com>> wrote:
>
> Two further thoughts on this:
>
> 1. Another DHCP agent problem that my team noticed is that it
> call_driver('reload_allocations') takes a bit of time (to
> regenerate the
> Dnsmasq config files, and to spawn a shell that sends a HUP
> signal) -
> enough so that if there is a fast steady rate of port-create and
> port-delete notifications coming from the Neutron server, these can
> build up in DHCPAgent's RPC queue, and then they still only get
> dispatched one at a time. So the queue and the time delay
> become longer
> and longer.
>
> I have a fix pending for this, which uses an extra thread to
> read those
> notifications off the RPC queue onto an internal queue, and then
> batches
> the call_driver('reload_allocations') processing when there is a
> contiguous sequence of such notifications - i.e. only does the
> config
> regeneration and HUP once, instead of lots of times.
>
> I don't think this is directly related to what you are seeing - but
> perhaps there actually is some link that I am missing.
>
> 2. There is an interesting and vaguely similar thread currently
> being
> discussed about the L3 agent (subject "L3 agent rescheduling
> issue") -
> about possible RPC/threading issues between the agent and the
> Neutron
> server. You might like to review that thread and see if it
> describes
> any problems analogous to your DHCP one.
>
> Regards,
> Neil
>
>
> On 08/06/15 17:53, Neil Jerram wrote:
> > My team has seen a problem that could be related: in a churn
> test where
> > VMs are created and terminated at a constant rate - but so
> that the
> > number of active VMs should remain roughly constant - the
> size of the
> > host and addn_hosts files keeps increasing.
> >
> > In other words, it appears that the config for VMs that have
> actually
> > been terminated is not being removed from the config file.
> Clearly, if
> > you have a limited pool of IP addresses, this can eventually
> lead to the
> > problem that you have described.
> >
> > For your case - i.e. with Icehouse - the problem might be
> > https://bugs.launchpad.net/neutron/+bug/1192381. I'm not
> sure if the
> > fix for that problem - i.e. sending port-create and port-delete
> > notifications to DHCP agents even when the server thinks they
> are down -
> > was merged before the Icehouse release, or not.
> >
> > But there must be at least one other cause as well, because
> my team was
> > seeing this with Juno-level code.
> >
> > Therefore I, too, would be interested in any other insights
> about this
> > problem.
> >
> > Regards,
> > Neil
> >
> >
> >
> > On 08/06/15 16:26, Daniel Comnea wrote:
> >> Any help, ideas please?
> >>
> >> Thx,
> >> Dani
> >>
> >> On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea
> <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
> >> <mailto:comnea.dani at gmail.com
> <mailto:comnea.dani at gmail.com>>> wrote:
> >>
> >> + Operators
> >>
> >> Much thanks in advance,
> >> Dani
> >>
> >>
> >>
> >>
> >> On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea
> <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
> >> <mailto:comnea.dani at gmail.com
> <mailto:comnea.dani at gmail.com>>> wrote:
> >>
> >> Hi all,
> >>
> >> I'm running IceHouse (build using Fuel 5.1.1) on
> Ubuntu where
> >> dnsmask version 2.59-4.
> >> I have a very basic network layout where i have a
> private net
> >> which has 2 subnets
> >>
> >> 2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net
> >> |
> >> e79c3477-d3e5-471c-a728-8d881cf31bee
> 192.168.110.0/24 <http://192.168.110.0/24>
> >> <http://192.168.110.0/24> |
> >> |
> >> |
> |
> >> f48c3223-8507-455c-9c13-8b727ea5f441
> 192.168.111.0/24 <http://192.168.111.0/24>
> >> <http://192.168.111.0/24> |
> >>
> >> and i'm creating VMs via HEAT.
> >> What is happening is that sometimes i get duplicated
> entries in
> >> [1] and because of that the VM which was spun up
> doesn't get
> >> an ip.
> >> The Dnsmask processes are running okay [2] and i
> can't see
> >> anything special/ wrong in it.
> >>
> >> Any idea why this is happening? Or are you aware of
> any bugs
> >> around this area? Do you see a problems with having
> 2 subnets
> >> mapped to 1 private-net?
> >>
> >>
> >>
> >> Thanks,
> >> Dani
> >>
> >> [1]
> >>
> >>
> /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
> >>
> >> [2]
> >>
> >> nobody 5664 1 0 Jun02 ? 00:00:08 dnsmasq
> >> --no-hosts --no-resolv --strict-order --bind-interfaces
> >> --interface=tapc9164734-0c --except-interface=lo
> >>
> >>
> --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid
> >>
> >>
> --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host
> >>
> >>
> >>
> --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
> >>
> >>
> >>
> --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts
> >>
> >> --leasefile-ro --dhcp-authoritative
> >> --dhcp-range=set:tag0,192.168.110.0,static,86400s
> >> --dhcp-range=set:tag1,192.168.111.0,static,86400s
> >> --dhcp-lease-max=512 --conf-file= --server=10.0.0.31
> >> --server=10.0.0.32 --domain=openstacklocal
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> <mailto:OpenStack-operators at lists.openstack.org>
> >>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org
> <mailto:OpenStack-operators at lists.openstack.org>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> --
> --
> Andrew Woodward
> Mirantis
> Fuel Community Ambassador
> Ceph Community
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Kevin Benton
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
More information about the OpenStack-operators
mailing list