[openstack-dev] Recent oslo.config/Quantum changes [was Re: For those using Quantum with devstack]

Robert Collins robertc at robertcollins.net
Fri May 24 23:05:10 UTC 2013


On 23 May 2013 23:45, Mark McLoughlin <markmc at redhat.com> wrote:
> On Thu, 2013-05-23 at 12:28 +0100, Mark McLoughlin wrote:
>> Hey
>>
>> Let me try and summarize a few things about this:
>
> Since this is definitely TL and I expect there'll be plenty who don't
> TR, let me TL;DR :)
>
>   1) the breakage people saw in Quantum was due to the fact the Quantum
>      merged changes requiring latest oslo.config and it wasn't
>      available. This should never have gotten through the gate, but it
>      did.


The gate currently tests with trunk of all projects: pip-requires that
are satisfied by trunk of any other project are preserved intact. As
it stands, you cannot detect incompatibility between a new commit and
existing released libraries/servers, *but* you can detect 'next
release of X will be incompatible with next release of Y'.

There are really a number of test cases for any X and Y, where X is a
project with a change being made to it
trunk X trunk Y
trunk X latest release Y
trunk X oldest supported Y
latest release X trunk Y
latest release X latest release Y
latest release X oldest supported Y
oldest supported X trunk Y
oldest supported X latest release Y
oldest supported X oldest supported Y

now, most of these are uninteresting because we've previously tested
them, but the become interesting when third party deps are changing -
we proxy 'the dep change works' by changing the dependencies file in
X, but that doesn't actually cover the spectrum.

When X is truely changing - it's not a third party dep, but a code
change in X, we have:
new X trunk Y
new X latest release Y
new X oldest supported Y

we currently test new X vs trunk Y. The gaps we're seeing are changes
where we test new X trunk Y, but new X latest release Y, or new X
oldest support Y are broken.

I think we should add configurations to devstack-gate to test these
other combinations, which should be fairly straightforward.

The biggest issue we'll face (based on quick chats with -infra folk)
is that something like 1% of testruns fail spuriously, and we are all
in the habit of going 'recheck no bug' rather than 'damn, thats a
critical issue we need to fix' - when a spurious failure occurs, the
pipeline of predicted merges stalls - which might be an hour or more
deep, and we do hundreds of runs a day so this a multiple-times a day
event... and that makes everything start to crawl. If we make that 3
times as likely to happen, it's possible we'll cross the threshold
such that we never recover.

The solution for this is obvious : we need to step up and stop feature
work to fix these reliability problems that are hurting all of us.

tl;dr: lets:
 - commit to fixing spurious failures as critical reliability issues.
 - add two test configurations - change vs latest release of all other
projects + change vs oldest-supported release of all other projects.

This will likely uncover a number of issues beyond the three we've
found this week in TripleO - as we're only scratching the surface of
coverage so far - but once we're beyond the initial teething pain,
we'll be solid.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services



More information about the OpenStack-dev mailing list