[openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

Sean Dague sean at dague.net
Thu Sep 15 19:08:42 UTC 2016


On 09/14/2016 11:57 PM, Mike Bayer wrote:
> 
> 
> On 09/14/2016 11:05 PM, Mike Bayer wrote:
>>
>> Are *these* errors also new as of version 4.13.3 of oslo.db ?   Because
>> here I have more suspicion of one particular oslo.db change here.
> 
> The version in question that has the changes to provisioning and
> anything really to do with this area is 4.12.0.   So if you didn't see
> any problem w/ 4.12 then almost definitely oslo.db is not the cause -
> the code changes subsequent to 4.12 have no relationship to any system
> used by the opportunistic test base.    I would hope at least that 4.12
> is the version where we see things changing because there were small
> changes to the provisioning code.
> 
> But at the same time, I'm combing through the quite small adjustments to
> the provisioning code as of 4.12.0 and I'm not seeing what could
> introduce this issue.   That said, we really should never see the kind
> of error we see with the "DROP DATABASE" failing because it remains in
> use, however this can be a side effect of the test itself having
> problems with the state of a different connection, not being closed and
> locks remain held.
> 
> That is, there's poor failure modes for sure here, I just can't see
> anything in 4.13 or even 4.12 that would suddenly introduce them.
> 
> By all means if these failures disappear when we go to 4.11 vs. 4.12,
> that would be where we need to go and to look for next cycle.     From
> my POV if the failures do disappear then that would be the best evidence
> that the oslo.db version is the factor.

In looking through the change history, I agree that nothing in the
4.13.x stream is plausibly related to this failure. The last time
something seems related is back in the 4.11/4.12 space. That would be a
much risker change than just rolling back to 4.13.0 at this point in the
release, so I would not recommend it.

So I think Roman's approach of trying to just adjust the timeout seems
like the lowest risk way to move forward. Hopefully that mitigates things.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list