2017-11-14 8:24 GMT+00:00 Tobias Urdin <tobias.urdin at crystone.com>: > Trying to trace this, tempest calls the POST /servers/<instance id>/action > API endpoint for the nova compute api. > > https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82 > > Nova then takes the requests and tries to do this floating ip association > using the neutron server api. > > http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz > > 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips > [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 > 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate > floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance > d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to > https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c > timed out: ConnectTimeout: Request to > https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c > timed out > > Checking that timestamp in the neutron-server logs: > http://paste.openstack.org/show/626240/ > > We can see that during this timestamp right before at 23:12:30.377 and then > after 23:12:35.611 everything seems to be doing fine. > So there is some connectivity issues to the neutron API from where the Nova > API is running causing a timeout. > > Now some more questions would be: > > * Why is the return code 400? Are we being fooled or is it actually a > connection timeout. > * Is the Neutron API stuck causing the failed connection? All talk are done > over loopback so chance of a problem there is very low. > * Any firewall catching this? Not likely since the agent processes requests > right before and after. > > I can't find anything interesting in the overall other system logs that > could explain that. > Back to the logs! I'm pretty certain that this is a deadlock between nova and neutron, though I cannot put my finger on the exact spot yet. But looking at the neutron log that you extracted you can see that neutron indeed tries to give a successful answer to the fip request just after nova has given up waiting for it (seems the timeout is 30s here): 2017-10-29 23:12:35.932 18958 INFO neutron.wsgi [req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - default default] 127.0.0.1 "PUT /v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1" status: 200 len: 746 time: 30.4427412 Also, looking at http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz is seems that nova-api is started with two processes and one thread, not sure if that means two processes with one thread each or only one thread total, anyway nova-api might be getting stuck there.