[openstack-dev] [Openstack-operators] [openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation

Shraddha Pandhe shraddha.pandhe at yahoo.com
Tue Jun 9 17:18:30 UTC 2015


Hi Daniel,
I see following in your command
--dhcp-range=set:tag0,192.168.110.0,static,86400s --dhcp-range=set:tag1,192.168.111.0,static,86400s

Is this expected? Was this command generated by the agent itself, or was Dnsmasq manually started?



 



     On Tuesday, June 9, 2015 4:41 AM, Kevin Benton <blak111 at gmail.com> wrote:
   

 >Just to be sure, I assume we're focussing here on the issue that Daniel reported
Yes.
>To be clear, though, what code are you trying to reproduce on?  Current master?
I was trying on 2014.1.3, which is the version I understand to be on Fuel 5.1.1.
>I'm not clear whether that would qualify as 'concurrent', in the sense that you have in mind.
It doesn't look like it based on the pseudocode. I was thinking of a condition where a port is deleted nearly very quickly after it was created. Is that possible with your test? If not, then my theory about out-of-order notifications might not be any good.
On Tue, Jun 9, 2015 at 3:34 AM, Neil Jerram <Neil.Jerram at metaswitch.com> wrote:

On 09/06/15 01:15, Kevin Benton wrote:

I'm having difficulty reproducing the issue. The bug that Neil
referenced (https://bugs.launchpad.net/neutron/+bug/1192381) looks like
it was in Icehouse well before the 2014.1.3 release that looks like Fuel
5.1.1 is using.


Just to be sure, I assume we're focussing here on the issue that Daniel reported (IP appears twice in Dnsmasq config), and for which I described a possible corollary (Dnsmasq config size keeps growing), and NOT on the "Another DHCP agent problem" that I mentioned below. :-)

BTW, now that I've reviewed the history of when my team saw this, I can say that it was actually first reported to us with the 'IP appears twice in Dnsmasq config' symptom - i.e. exactly the same as Daniel's case. The fact of the Dnsmasq config increasing in size was noticed later.


I tried setting the agent report interval to something higher than the
downtime to make it seem like the agent is failing sporadically to the
server, but it's not impacting the notifications.


Makes sense - that's the effect of the fix for 1192381.

To be clear, though, what code are you trying to reproduce on?  Current master?


Neil, does your testing where you saw something similar have a lot of
concurrent creation/deletion?


It was a test of continuously deleting and creating VMs, with this pseudocode:

thread_pool = new_thread_pool(size=30)
for x in range(0,30):
    thread_pool.submit(create_vm)
thread_pool.wait_for_all_threads_to_complete()
while True:
     time.sleep(5)
     for x in range(0,int(random.random()*5)):
          thread_pool.submit(randomly_delete_a_vm_and_create_a_new_one)

I'm not clear whether that would qualify as 'concurrent', in the sense that you have in mind.

Regards,
        Neil


On Mon, Jun 8, 2015 at 12:21 PM, Andrew Woodward <awoodward at mirantis.com
<mailto:awoodward at mirantis.com>> wrote:

    Daniel,

    This sounds familiar, see if this matches [1]. IIRC, there was
    another issue like this that was might already address this in the
    updates into Fuel 5.1.2 packages repo [2]. You can either update the
    neutron packages from [2] Or try one of community builds for 5.1.2
    [3]. If this doesn't resolve the issue, open a bug against MOS dev [4].

    [1] https://bugs.launchpad.net/bugs/1295715
    [2] http://fuel-repository.mirantis.com/fwm/5.1.2/ubuntu/pool/main/
    [3] https://ci.fuel-infra.org/
    [4] https://bugs.launchpad.net/mos/+filebug

    On Mon, Jun 8, 2015 at 10:15 AM Neil Jerram
    <Neil.Jerram at metaswitch.com <mailto:Neil.Jerram at metaswitch.com>> wrote:

        Two further thoughts on this:

        1. Another DHCP agent problem that my team noticed is that it
        call_driver('reload_allocations') takes a bit of time (to
        regenerate the
        Dnsmasq config files, and to spawn a shell that sends a HUP
        signal) -
        enough so that if there is a fast steady rate of port-create and
        port-delete notifications coming from the Neutron server, these can
        build up in DHCPAgent's RPC queue, and then they still only get
        dispatched one at a time.  So the queue and the time delay
        become longer
        and longer.

        I have a fix pending for this, which uses an extra thread to
        read those
        notifications off the RPC queue onto an internal queue, and then
        batches
        the call_driver('reload_allocations') processing when there is a
        contiguous sequence of such notifications - i.e. only does the
        config
        regeneration and HUP once, instead of lots of times.

        I don't think this is directly related to what you are seeing - but
        perhaps there actually is some link that I am missing.

        2. There is an interesting and vaguely similar thread currently
        being
        discussed about the L3 agent (subject "L3 agent rescheduling
        issue") -
        about possible RPC/threading issues between the agent and the
        Neutron
        server.  You might like to review that thread and see if it
        describes
        any problems analogous to your DHCP one.

        Regards,
                 Neil


        On 08/06/15 17:53, Neil Jerram wrote:
         > My team has seen a problem that could be related: in a churn
        test where
         > VMs are created and terminated at a constant rate - but so
        that the
         > number of active VMs should remain roughly constant - the
        size of the
         > host and addn_hosts files keeps increasing.
         >
         > In other words, it appears that the config for VMs that have
        actually
         > been terminated is not being removed from the config file.
        Clearly, if
         > you have a limited pool of IP addresses, this can eventually
        lead to the
         > problem that you have described.
         >
         > For your case - i.e. with Icehouse - the problem might be
         > https://bugs.launchpad.net/neutron/+bug/1192381.  I'm not
        sure if the
         > fix for that problem - i.e. sending port-create and port-delete
         > notifications to DHCP agents even when the server thinks they
        are down -
         > was merged before the Icehouse release, or not.
         >
         > But there must be at least one other cause as well, because
        my team was
         > seeing this with Juno-level code.
         >
         > Therefore I, too, would be interested in any other insights
        about this
         > problem.
         >
         > Regards,
         >      Neil
         >
         >
         >
         > On 08/06/15 16:26, Daniel Comnea wrote:
         >> Any help, ideas please?
         >>
         >> Thx,
         >> Dani
         >>
         >> On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea
        <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
         >> <mailto:comnea.dani at gmail.com
        <mailto:comnea.dani at gmail.com>>> wrote:
         >>
         >>     + Operators
         >>
         >>     Much thanks in advance,
         >>     Dani
         >>
         >>
         >>
         >>
         >>     On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea
        <comnea.dani at gmail.com <mailto:comnea.dani at gmail.com>
         >>     <mailto:comnea.dani at gmail.com
        <mailto:comnea.dani at gmail.com>>> wrote:
         >>
         >>         Hi all,
         >>
         >>         I'm running IceHouse (build using Fuel 5.1.1) on
        Ubuntu where
         >>         dnsmask version 2.59-4.
         >>         I have a very basic network layout where i have a
        private net
         >>         which has 2 subnets
         >>
         >>           2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net
         >>                                     |
         >>         e79c3477-d3e5-471c-a728-8d881cf31bee
        192.168.110.0/24 <http://192.168.110.0/24>
         >>         <http://192.168.110.0/24> |
         >>         |
         >>         |
              |
         >>         f48c3223-8507-455c-9c13-8b727ea5f441
        192.168.111.0/24 <http://192.168.111.0/24>
         >>         <http://192.168.111.0/24> |
         >>
         >>         and i'm creating VMs via HEAT.
         >>         What is happening is that sometimes i get duplicated
        entries in
         >>         [1] and because of that the VM which was spun up
        doesn't get
         >> an ip.
         >>         The Dnsmask processes are running okay [2] and i
        can't see
         >>         anything special/ wrong in it.
         >>
         >>         Any idea why this is happening? Or are you aware of
        any bugs
         >>         around this area? Do you see a problems with having
        2 subnets
         >>         mapped to 1 private-net?
         >>
         >>
         >>
         >>         Thanks,
         >>         Dani
         >>
         >>         [1]
         >>
         >>
        /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
         >>
         >>         [2]
         >>
         >>         nobody    5664     1  0 Jun02 ?        00:00:08 dnsmasq
         >>         --no-hosts --no-resolv --strict-order --bind-interfaces
         >>         --interface=tapc9164734-0c --except-interface=lo
         >>
         >>
        --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid
         >>
         >>
        --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host
         >>
         >>
         >>
        --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts
         >>
         >>
         >>
        --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts
         >>
         >>         --leasefile-ro --dhcp-authoritative
         >>         --dhcp-range=set:tag0,192.168.110.0,static,86400s
         >>         --dhcp-range=set:tag1,192.168.111.0,static,86400s
         >>         --dhcp-lease-max=512 --conf-file= --server=10.0.0.31
         >>         --server=10.0.0.32 --domain=openstacklocal
         >>
         >>
         >>
         >>
         >>
         >> _______________________________________________
         >> OpenStack-operators mailing list
         >> OpenStack-operators at lists.openstack.org
        <mailto:OpenStack-operators at lists.openstack.org>
         >>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
         >>
         >
         > _______________________________________________
         > OpenStack-operators mailing list
         > OpenStack-operators at lists.openstack.org
        <mailto:OpenStack-operators at lists.openstack.org>
         >
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

        __________________________________________________________________________
        OpenStack Development Mailing List (not for usage questions)
        Unsubscribe:
        OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
        <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

    --
    --
    Andrew Woodward
    Mirantis
    Fuel Community Ambassador
    Ceph Community

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
    <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Kevin Benton


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators






-- 
Kevin Benton
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150609/cc6cf74f/attachment.html>


More information about the OpenStack-dev mailing list