[openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

Tobias Urdin tobias.urdin at crystone.com
Tue Nov 14 08:24:20 UTC 2017


Trying to trace this, tempest calls the POST /servers/<instance id>/action API endpoint for the nova compute api.

https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82

Nova then takes the requests and tries to do this floating ip association using the neutron server api.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out: ConnectTimeout: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out

Checking that timestamp in the neutron-server logs:
http://paste.openstack.org/show/626240/

We can see that during this timestamp right before at 23:12:30.377 and then after 23:12:35.611 everything seems to be doing fine.
So there is some connectivity issues to the neutron API from where the Nova API is running causing a timeout.

Now some more questions would be:

* Why is the return code 400? Are we being fooled or is it actually a connection timeout.
* Is the Neutron API stuck causing the failed connection? All talk are done over loopback so chance of a problem there is very low.
* Any firewall catching this? Not likely since the agent processes requests right before and after.

I can't find anything interesting in the overall other system logs that could explain that.
Back to the logs!

Best regards
Tobias


On 11/14/2017 08:35 AM, Tobias Urdin wrote:

Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight into the project.


Can we backtrace the timed out message from where it's thrown/returned.

Error: Request to https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c timed out', u'code': 400}

I'm interested why we would get 400 code back, the floating ip operations should be async right so this would be the response from the API layer with information from the database that should

return more information.


Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch?  I mean the bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" <mnaser at vexxhost.com<mailto:mnaser at vexxhost.com>> a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!).  I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it.  I think there are a lot of users out there using
it!  I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser <mnaser at vexxhost.com<mailto:mnaser at vexxhost.com>> wrote:
> On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin <tobias.urdin at crystone.com<mailto:tobias.urdin at crystone.com>> wrote:
>> I've been staring at this for almost an hour now going through all the logs
>> and I can't really pin a point from
>>
>> where that error message is generated. Cannot find any references for the
>> timed out message that the API returns or the unable to associate part.
>>
>>
>> What I'm currently staring at is why would the instance fixed ip 172.24.5.17
>> be references as a network:router_gateway port in the OVS agent logs.
>>
>> 2017-10-29 23:19:27.591 11856 INFO
>> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
>> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
>> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
>> 'network_qos_policy_id': None, 'qos_policy_id': None,
>> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
>> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
>> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
>> 'ip_address': '172.24.5.17'}], 'device_owner': u'network:router_gateway',
>> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
>> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled':
>> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type':
>> u'flat', 'security_groups': []}
>>
>>
>> Anybody else seen anything interesting?
>
> Hi Tobias,
>
> Thanks for looking out.  I've spent a lot of time and I haven't
> successfully identified much, I can add the following
>
> - This issue is intermittent in CI
> - It does *not* happen on any specific providers, I've seen it fail on
> both OVH and Rackspace.
> - Not *all* floating iP assignments fail, if you look at the logs, you
> can see it attach a few successfully before failing
>
> But yeah, I'm still quite at a loss and not having this coverage isn't fun.
>
>>
>>
>> On 10/30/2017 11:08 PM, Brian Haley wrote:
>>
>> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
>>
>>  From a quick glance at the logs my guess is that the issue is related
>> to this stack trace in the l3 agent logs:
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146
>>
>> I'm not sure what's causing it to complain there. But, I'm on a plane
>> right now (which is why this is a top post, sorry) so I can't really dig
>> much more than that. I'll try to take a deeper look at things later when
>> I'm on solid ground. (hopefully someone will beat me to it by then though)
>>
>> I don't think that l3-agent trace is it, as the failure is coming from
>> the API.  It's actually a trace that's happening due to the async nature
>> of how the agent runs arping, fix is
>> https://review.openstack.org/#/c/507914/ but it only removes the log noise.
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
>> has some tracebacks that look config related, possible missing DB table?
>>   But I haven't looked very closely.
>>
>> -Brian
>>
>>
>> On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
>> <mnaser at vexxhost.com<mailto:mnaser at vexxhost.com>> wrote:
>>
>>     Hi everyone,
>>
>>     I'm looking for some help regarding an issue that we're having with
>>     the Puppet OpenStack modules, we've had very inconsistent failures in
>>     the Xenial with the following error:
>>
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>>          Details: {u'message': u'Unable to associate floating IP
>>     172.24.5.17 <http://172.24.5.17>  to fixed IP10.100.0.8
>> <http://10.100.0.8>  for instance
>>     d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>>
>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>>     timed out', u'code': 400}
>>
>>     At this point, we're at a bit of a loss.  I've tried my best in order
>>     to find the root cause however we have not been able to do this.  It
>>     was persistent enough that we elected to go non-voting for our Xenial
>>     gates, however, with no fix ahead of us, I feel like this is a waste
>>     of resources and we need to either fix this or drop CI for Ubuntu.  We
>>     don't deploy on Ubuntu and most of the developers working on the
>>     project don't either at this point, so we need a bit of resources.
>>
>>     If you're a user of Puppet on Xenial, we need your help!  Without any
>>     resources going to fix this, we'd unfortunately have to drop support
>>     for Ubuntu because of the lack of resources to maintain it (or
>>     assistance).  We (Puppet OpenStack team) would be more than happy to
>>     work together to fix this so pop-in at #puppet-openstack or reply to
>>     this email and let's get this issue fixed.
>>
>>     Thanks,
>>     Mohammed
>>
>>     ------------------------------------------------------------------------
>>
>>     OpenStack Development Mailing List (not for usage questions)
>>     Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171114/718e13da/attachment.html>


More information about the OpenStack-dev mailing list