[openstack-dev] [Neutron][qa] Intermittent failure of tempest test test_network_basic_ops

Jay Pipes jaypipes at gmail.com
Thu Jan 9 18:10:52 UTC 2014


On Thu, 2014-01-09 at 09:09 +0100, Salvatore Orlando wrote:
> I am afraid I need to correct you Jay!

I always welcome corrections to things I've gotten wrong, so no worries
at all!

> This actually appears to be bug 1253896 [1]

Ah, the infamous "SSH bug" :) Yeah, so last night I spent a few hours
digging through log files and running a variety of e-r queries trying to
find some patterns for the bugs that Joe G had sent an ML post about.

I went round in circles, unfortunately :( When I thought I'd found a
pattern, invariably I would doubt my initial findings and wander into
new areas in a wild goose chase.

At various times, I thought something was up with the DHCP agent, as
there were lots of "No DHCP Agent found" errors in the q-dhcp screen
logs. But I could not correlate any relationship with the failures in
the 4 bugs.

Then I started thinking that there was a timing/race condition where a
security group was being added to the Nova-side servers cache before it
had actually been constructed fully on the Neutron-side. But I was not
able to fully track down the many, many debug messages that are involved
in the full sequence of VM launch :( At around 4am, I gave up and went
to bed...

> Technically, what we call 'bug' here is actually a failure
> manifestation.
> So far, we have removed several bugs causing this failure. The last
> patch was pushed to devstack around Christmas.
> Nevertheless, if you look at recent comments and Joe's email, we still
> have a non-negligible failure rate on the gate.

Understood. I suspect actually that some of the various performance
improvements from Phil Day and others around optimizing certain server
and secgroup list calls have made the underlying race conditions show up
more often -- since the list calls are completing much faster, which
ironically gives Neutron less time to complete setup operations!

So, a performance patch on the Nova side ends up putting more pressure
on the Neutron side, which causes the rate of occurrence for these
sticky bugs (with potentially many root causes) to spike.

Such is life I guess :)

> It is also worth mentioning that if you are running your tests with
> parallelism enabled (ie: you're running tempest with tox -esmoke
> rather than tox -esmokeserial) you will end up with a higher
> occurrence of this failure due to more bugs causing it. These bugs are
> due to some weakness in the OVS agent that we are addressing with
> patches for blueprint neutron-tempest-parallel [2].

Interesting. If you wouldn't mind, what makes you think this is a
weakness in the OVS agent? I would certainly appreciate your expertise
in this area, since it would help me in my own bug-searching endeavors.

All the best,
-jay

> Regards,
> Salvatore
> 
> 
> 
> 
> [1] https://bugs.launchpad.net/neutron/+bug/1253896
> [2] https://blueprints.launchpad.net/neutron/+spec/neutron-tempest-parallel
> 
> 
> On 9 January 2014 05:38, Jay Pipes <jaypipes at gmail.com> wrote:
>         On Wed, 2014-01-08 at 18:46 -0800, Sukhdev Kapur wrote:
>         > Dear fellow developers,
>         
>         > I am running few Neutron tempest tests and noticing an
>         intermittent
>         > failure of tempest.scenario.test_network_basic_ops.
>         
>         > I ran this test 50+ times and am getting intermittent
>         failure. The
>         > pass rate is apps. 70%. The 30% of the time it fails mostly
>         in
>         > _check_public_network_connectivity.
>         
>         > Has anybody seen this?
>         > If there is a fix or work around for this, please share your
>         wisdom.
>         
>         
>         Unfortunately, I believe you are running into this bug:
>         
>         https://bugs.launchpad.net/nova/+bug/1254890
>         
>         The bug is Triaged in Nova (meaning, there is a suggested fix
>         in the bug
>         report). It's currently affecting the gate negatively and is
>         certainly
>         on the radar of the various PTLs affected.
>         
>         Best,
>         -jay
>         
>         
>         
>         _______________________________________________
>         OpenStack-dev mailing list
>         OpenStack-dev at lists.openstack.org
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list