Open Stack

Mon Jan 20 15:33:14 UTC 2014

Sorry for top-posting -- using web mail client.

Is it possible to change the retry interval in Cirros (or cloud-init?) so
that the backoff is less than 60 seconds?

Best,
-jay

On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly <
dara2002-openstack at yahoo.com> wrote:

>
> I did a test to see what the dhcp client on cirros does. I killed the dhcp
> agent and started an instance. The instance sent the first dhcp offer after
> about 35 sec. Then another 60 sec later, and a final one after another 60
> sec.
>
> So a revised theory for what happened is this:
>
> t=0 tempest starts vm and starts polling for ACTIVE status
> t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60
> sec
> t=40 instance does a dhcp discover - no response - so sets a timer for 60
> sec
> t=45 ovs-agent sets the port vlan
> t=80 tempest gives up and kills vm
> t=100 instance would have sent another dhcp discover now if it had been
> let live
>
> I think it would be worth trying to change that test to poll for 120
> seconds instead of 60.
>
>
>   On Monday, 20 January 2014, 11:23, Darragh O'Reilly <
> dara2002-openstack at yahoo.com> wrote:
>
> Hi Salvatore,
>
> I presume it's this one?
>
> http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604
>
> Is it true that the cirros image just fires off a few dhcp discovers and
> then gives up? If so, then maybe it did so before the tagging happened. Do
> we have the instance console log? It took about 45 seconds from when the
> port was created to when it was tagged.
>
> 2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor [-]
> Output received from ovsdb monitor:
> {"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]}
> <snip>
> 2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-]
> Command: ['sudo', '/usr/local/bin/neutron-rootwrap',
> '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 'Port',
> 'tap496b808c-b5', 'tag=64']
> Exit code: 0
>
> Darragh.
>
> >I have been seeing in the past 2 days timeout failures on gate jobs which
> I
> >am struggling to explain. An example is available in [1]
> >These are the usual failure that we associate with bug 1253896, but this
> >time I can verify that:
> >- The floating IP is correctly wired (IP and NAT rules)
> >- The DHCP port is correctly wired, as well as the VM port and the router
> >port
> >- The DHCP agent is correctly started for the network
> >
> >However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE message
> is
> >seen.
> >Any help at interpreting the logs will be appreciated.
> >
> >
> >Salvatore
> >
> >[1] http://logs.openstack.org/38/65838
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140120/06dcda54/attachment.html>

Open Stack

[openstack-dev] [Neutron] Apparently weird timeout issue

OpenStack

Community

Documentation

Branding & Legal