[openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

Mohammed Naser mnaser at vexxhost.com
Tue Nov 14 16:29:58 UTC 2017


Hi everyone,

Thank you so much for the work on this, I'm sure we can progress with
this together.  I have noticed that this only occurs in master and
never in the stable branches.  Also, it only occurs under Ubuntu (so
maybe something related to mod_wsgi version?)

Given that we don't have any "master" built packages for Ubuntu, we
test against the latest release which is the pike release.

https://github.com/openstack/puppet-openstack-integration/blob/master/manifests/repos.pp#L6-L10

I've noticed the issue is not as present in older branches but much
more visible in master.

Thanks,
Mohammed

On Tue, Nov 14, 2017 at 6:21 AM, Tobias Urdin <tobias.urdin at crystone.com> wrote:
> Yea, I've been scavenging the logs for any kind of indicator on what
> might have gone wrong but I can't see anything
> related to a deadlock even though I'm very certain that's the issue but
> don't know what's causing it.
>
> Perhaps we will need to manually recreate this issue and then
> troubleshoot it manually.
> The apache2 mod_wsgi config should be OK according to the docs [1].
>
> Best regards
>
> [1]
> http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
>
> On 11/14/2017 11:12 AM, Jens Harbott wrote:
>> 2017-11-14 8:24 GMT+00:00 Tobias Urdin <tobias.urdin at crystone.com>:
>>> Trying to trace this, tempest calls the POST /servers/<instance id>/action
>>> API endpoint for the nova compute api.
>>>
>>> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82
>>>
>>> Nova then takes the requests and tries to do this floating ip association
>>> using the neutron server api.
>>>
>>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz
>>>
>>> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips
>>> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647
>>> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate
>>> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance
>>> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>>> timed out: ConnectTimeout: Request to
>>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>>> timed out
>>>
>>> Checking that timestamp in the neutron-server logs:
>>> http://paste.openstack.org/show/626240/
>>>
>>> We can see that during this timestamp right before at 23:12:30.377 and then
>>> after 23:12:35.611 everything seems to be doing fine.
>>> So there is some connectivity issues to the neutron API from where the Nova
>>> API is running causing a timeout.
>>>
>>> Now some more questions would be:
>>>
>>> * Why is the return code 400? Are we being fooled or is it actually a
>>> connection timeout.
>>> * Is the Neutron API stuck causing the failed connection? All talk are done
>>> over loopback so chance of a problem there is very low.
>>> * Any firewall catching this? Not likely since the agent processes requests
>>> right before and after.
>>>
>>> I can't find anything interesting in the overall other system logs that
>>> could explain that.
>>> Back to the logs!
>> I'm pretty certain that this is a deadlock between nova and neutron,
>> though I cannot put my finger on the exact spot yet. But looking at
>> the neutron log that you extracted you can see that neutron indeed
>> tries to give a successful answer to the fip request just after nova
>> has given up waiting for it (seems the timeout is 30s here):
>>
>> 2017-10-29 23:12:35.932 18958 INFO neutron.wsgi
>> [req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8
>> 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 -
>> default default] 127.0.0.1 "PUT
>> /v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1"
>> status: 200  len: 746 time: 30.4427412
>>
>> Also, looking at
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz
>> is seems that nova-api is started with two processes and one thread,
>> not sure if that means two processes with one thread each or only one
>> thread total, anyway nova-api might be getting stuck there.
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list