[openstack-dev] [neutron] [infra] Race conditions in fwaas that impact the gate

Sean M. Collins sean at coreitpro.com
Wed Aug 12 09:57:37 UTC 2015


[reformatted and infra tag added]


On Tue, Aug 11, 2015 at 07:32:34PM EDT, Salvatore Orlando wrote:
> On 12 August 2015 at 00:21, Sean M. Collins <sean at coreitpro.com> wrote:
> 
> > Hello,
> >
> > Today has been an exciting day, to say the least. Earlier today I was
> > pinged on IRC about some firewall as a service unit test failures that
> > were blocking patches from being merged, such as
> > https://review.openstack.org/#/c/211537/.
> >
> > Neutron devs started poking around a bit and discussing on the IRC channel.
> >
> >
> > http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2015-08-11.log.html#t2015-08-11T16:59:13
> >
> > I've started to dig a little bit and document what I've found on this
> > bug.
> >
> > https://bugs.launchpad.net/neutron/+bug/1483875
> >
> > There was a change recently merged in devstack-gate which changes the
> > MySQL database driver and the number of workers -
> > https://review.openstack.org/#/c/210649/
> > which might be what is triggering the race condition - but I'm honestly
> > not sure.
> >
> > I proposed a revert to a section of the FwaaS code, but frankly I'm not
> > sure if this will fix the problem - https://review.openstack.org/211677
> > - so I bumped it out of the merge queue when my anxiety reached maximum.
> > I'm just not confident enough about my knowledge of the FwaaS codebase
> > to really be making these kinds of changes.
> >
> > Is there anyone that has any insights?
> >
> >
> > --
> > Sean M. Collins
> >
> >
>
> I have been hit by these failures as well.
> I think you did well by bumping out that revert from the queue; I think it
> simply cures the sympton possibly affecting correct operations of the
> firewall service.
> If we are looking at removing the sympton on the API job, than I'd skip the
> failing tests while somebody figures out what's going on (unless the team
> decides that it is better to revert again multiple workers).
> 
> However, I think the issue might not be limited at firewall. I've seen a
> worrying spike in rally failures [1]. Since it's non-voting probably
> developers do not care a lot about it, but it provides very useful
> insights. I am looking at rally logs now - at the moment I have not yet a
> clear idea of the root cause of such failures.


Ihar pushed a revert of the DevStack gate job[1], maybe infra can weigh in
on that - otherwise if it makes everyone happier I can just set the test
to skip for the time being to unblock everyone. I'll then do my research
I've been meaning to do into xfail[2] so we can continue running tests and
capturing data, but not making a job fail because of a test or race
condition we're aware of.

[1]: https://review.openstack.org/#/c/211853/

[2]: http://pytest.org/latest/skipping.html

-- 
Sean M. Collins



More information about the OpenStack-dev mailing list