[openstack-dev] [Neutron] Apparently weird timeout issue

Darragh O'Reilly dara2002-openstack at yahoo.com
Mon Jan 20 15:51:34 UTC 2014


On Monday, 20 January 2014, 15:33, Jay Pipes <jaypipes at gmail.com> wrote:

>Sorry for top-posting -- using web mail client.
no worries - it doesn't bother me.
>
>Is it possible to change the retry interval in Cirros (or cloud-init?) so that the backoff is less than 60 seconds?
I think the udhcpc command line parameters are baked into the image. It's part of BusyBox, and I'm not even sure if it's configurable from a script/text file.
>
>Best,
>
-jay
>
>
>
>
>On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly <dara2002-openstack at yahoo.com> wrote:
>
>
>>I did a test to see what the dhcp client on cirros does. I killed the dhcp agent and started an instance. The instance sent the first dhcp offer after about 35 sec. Then another 60 sec later, and a final one after another 60 sec.
>>
>>
>>So a revised theory for what happened is this:  
>>
>>t=0 tempest starts vm and starts polling for ACTIVE status
>>t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60 sec
>>t=40 instance does a dhcp discover - no response - so sets a timer for 60 sec
>>t=45 ovs-agent sets the port vlan
>>t=80 tempest gives up and kills vm
>>t=100 instance would have sent another dhcp discover now if it had been let live
>>
>>I think it would be worth trying to change that test to poll for 120 seconds instead of 60.
>>
>>
>>
>>On Monday, 20 January 2014, 11:23, Darragh O'Reilly <dara2002-openstack at yahoo.com> wrote:
>> 
>>Hi Salvatore,
>>>
>>>
>>>I presume it's this one? 
>>>http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604
>>>
>>>
>>>Is it true that the cirros image just fires off a few dhcp discovers and then gives up? If so, then maybe it did so before the tagging happened. Do we have the instance console log? It took about 45 seconds from when the port was created to when it was tagged.
>>>
>>>
>>>2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor [-] Output 
received from ovsdb monitor: 
{"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]}
>>><snip>
>>>2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-] 
>>>Command:
['sudo', '/usr/local/bin/neutron-rootwrap', 
'/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 
'Port', 'tap496b808c-b5', 'tag=64']
>>>Exit code: 0
>>>
>>>
>>>Darragh.
>>>
>>>
>>>
>>>>I have been seeing in the past 2 days timeout failures on gate jobs which I
>>>>am struggling to explain. An example is
available in [1]
>>>>These are the usual failure that we associate with bug 1253896, but this
>>>>time I can verify that:
>>>>- The floating IP is correctly wired (IP and NAT rules)
>>>>- The DHCP port is correctly wired, as well as the VM port and the router
>>>>port
>>>>- The DHCP agent is correctly started for the network
>>>>
>>>>However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE message is
>>>>seen.
>>>>Any help at interpreting the logs will be appreciated.
>>>>
>>>>
>>>>Salvatore
>>>>
>>>>[1] http://logs.openstack.org/38/65838
>>>
>>>
>>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
>



More information about the OpenStack-dev mailing list