Yea, I've been scavenging the logs for any kind of indicator on what might have gone wrong but I can't see anything related to a deadlock even though I'm very certain that's the issue but don't know what's causing it. Perhaps we will need to manually recreate this issue and then troubleshoot it manually. The apache2 mod_wsgi config should be OK according to the docs [1]. Best regards [1] http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html On 11/14/2017 11:12 AM, Jens Harbott wrote: > 2017-11-14 8:24 GMT+00:00 Tobias Urdin <tobias.urdin at crystone.com>: >> Trying to trace this, tempest calls the POST /servers/<instance id>/action >> API endpoint for the nova compute api. >> >> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82 >> >> Nova then takes the requests and tries to do this floating ip association >> using the neutron server api. >> >> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz >> >> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips >> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 >> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate >> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance >> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to >> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c >> timed out: ConnectTimeout: Request to >> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c >> timed out >> >> Checking that timestamp in the neutron-server logs: >> http://paste.openstack.org/show/626240/ >> >> We can see that during this timestamp right before at 23:12:30.377 and then >> after 23:12:35.611 everything seems to be doing fine. >> So there is some connectivity issues to the neutron API from where the Nova >> API is running causing a timeout. >> >> Now some more questions would be: >> >> * Why is the return code 400? Are we being fooled or is it actually a >> connection timeout. >> * Is the Neutron API stuck causing the failed connection? All talk are done >> over loopback so chance of a problem there is very low. >> * Any firewall catching this? Not likely since the agent processes requests >> right before and after. >> >> I can't find anything interesting in the overall other system logs that >> could explain that. >> Back to the logs! > I'm pretty certain that this is a deadlock between nova and neutron, > though I cannot put my finger on the exact spot yet. But looking at > the neutron log that you extracted you can see that neutron indeed > tries to give a successful answer to the fip request just after nova > has given up waiting for it (seems the timeout is 30s here): > > 2017-10-29 23:12:35.932 18958 INFO neutron.wsgi > [req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8 > 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - > default default] 127.0.0.1 "PUT > /v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1" > status: 200 len: 746 time: 30.4427412 > > Also, looking at > http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz > is seems that nova-api is started with two processes and one thread, > not sure if that means two processes with one thread each or only one > thread total, anyway nova-api might be getting stuck there. > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >