[openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

Jens Harbott j.harbott at x-ion.de
Tue Nov 14 10:09:37 UTC 2017


2017-11-14 8:24 GMT+00:00 Tobias Urdin <tobias.urdin at crystone.com>:
> Trying to trace this, tempest calls the POST /servers/<instance id>/action
> API endpoint for the nova compute api.
>
> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82
>
> Nova then takes the requests and tries to do this floating ip association
> using the neutron server api.
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz
>
> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips
> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647
> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate
> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance
> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
> timed out: ConnectTimeout: Request to
> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
> timed out
>
> Checking that timestamp in the neutron-server logs:
> http://paste.openstack.org/show/626240/
>
> We can see that during this timestamp right before at 23:12:30.377 and then
> after 23:12:35.611 everything seems to be doing fine.
> So there is some connectivity issues to the neutron API from where the Nova
> API is running causing a timeout.
>
> Now some more questions would be:
>
> * Why is the return code 400? Are we being fooled or is it actually a
> connection timeout.
> * Is the Neutron API stuck causing the failed connection? All talk are done
> over loopback so chance of a problem there is very low.
> * Any firewall catching this? Not likely since the agent processes requests
> right before and after.
>
> I can't find anything interesting in the overall other system logs that
> could explain that.
> Back to the logs!

I'm pretty certain that this is a deadlock between nova and neutron,
though I cannot put my finger on the exact spot yet. But looking at
the neutron log that you extracted you can see that neutron indeed
tries to give a successful answer to the fip request just after nova
has given up waiting for it (seems the timeout is 30s here):

2017-10-29 23:12:35.932 18958 INFO neutron.wsgi
[req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8
42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 -
default default] 127.0.0.1 "PUT
/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1"
status: 200  len: 746 time: 30.4427412

Also, looking at
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz
is seems that nova-api is started with two processes and one thread,
not sure if that means two processes with one thread each or only one
thread total, anyway nova-api might be getting stuck there.



More information about the OpenStack-dev mailing list