[openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan

Matt Riedemann mriedem at linux.vnet.ibm.com
Tue Nov 26 16:49:43 UTC 2013

On Tuesday, November 26, 2013 10:07:02 AM, Sean Dague wrote:
> On 11/26/2013 09:56 AM, Russell Bryant wrote:
>> On 11/26/2013 09:38 AM, Bob Ball wrote:
>>>> -----Original Message-----
>>>> From: Russell Bryant [mailto:rbryant at redhat.com]
>>>> Sent: 26 November 2013 13:56
>>>> To: openstack-dev at lists.openstack.org
>>>> Cc: Sean Dague
>>>> Subject: Re: [openstack-dev] [Nova] Hypervisor CI requirement and
>>>> deprecation plan
>>>> On 11/26/2013 04:48 AM, Bob Ball wrote:
>>>>> I hope we can safely say that we should run against all "gating" tests which
>>>> require Nova?  Currently we run quite a number of tests in the gate that
>>>> succeed even when Nova is not running as the gate isn't just for Nova but
>>>> for all projects.
>>>> Would you like to come up with a more detailed proposal?  What tests
>>>> would you cut, and how much time does it save?
>>> I don't have a detailed proposal yet - but it's very possible that we'll want one in the coming weeks.
>>> In terms of the time saved, I noticed that a tempest smoke run with Nova absent took 400 seconds on one of my machines (a particularly slow one) - so I imagine that would translate to maybe a 300 second / 5 minute reduction in overall time.  Total smoke took approximately 800 seconds on the same machine.
>> I don't think the smoke tests are really relevant here.  That's not
>> related to Nova vs non-Nova tests, right?
>>> If the approach could be acceptable then yes, I'm happy to come up with a detailed set of tests that I would propose cutting.
>>> My primary hesitation with the approach is it would need Tempest reviewers to be aware of this extra type of test, and flag up if a test is added to the full tempest suite which should also be in the nova tempest suite.
>> Right now I don't think it's acceptable.  I was suggesting a more
>> detailed proposal to help convince me.  :-)
> So we already have the beginnings of service tags in Tempest, that would
> let you slice exactly like this. I don't think the infrastructure is
> fully complete yet, but the idea being that you could run the subset of
> tests that interact with "compute" or "networking" in any real way.
> Realize... that's not going to drop that many tests for something like
> compute, it's touched a lot.
> 	-Sean
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Good to know about the service tags, I think I remember being broken at 
some point after those tempest.conf.sample changes. :)

My overall concern, and I think the other guys doing this for virt 
drivers will agree, is trying to scope down the exposure to unrelated 
failures.  For example, if there is a bug in swift breaking the gate, 
it could start breaking the nova virt driver CI as well.  When things 
get bad in the gate, it takes some monstrous effort to rally people 
across the projects to come together to unblock it (like what Joe 
Gordon was doing last week).

I'm running Tempest internally about once per day when we rebase code 
with the community and that's to cover running with the PowerVM driver 
for nova, Storwize driver for cinder, OVS for neutron, with qpid and 
DB2.  We're running almost a full run except for the third party boto 
tests and swift API tests.  The thing is, when something fails, I have 
to figure out if it's environmental (infra), a problem with tempest 
(think instability with neutron in the gate), a configuration issue, or 
a code bug.  That's a lot for one person to have to cover, even a small 
team.  That's why at some points we just have to ignore/exclude tests 
that continuously fail but we can't figure out (think intermittent gate 
breaker bugs that are open for months).  Now multiply this out across 
all the nova virt drivers, the neutron plugins and I'm assuming at some 
point the various glance backends and cinder drivers (haven't heard if 
they are planning on the same types of CI requirements yet).  I think 
either we're going to have a lot of flaky/instable driver CI going on 
so the scores can't be trusted, or we're going to develop a lot of 
people that get really good at infra/QA (which would be a plus in the 
long-run, but maybe not what those teams set out to be).

I don't have any good answers, I'm just trying to raise the issue since 
this is complicated.  I think it's also hard for people that aren't 
forced to invest in infra/QA on a daily basis to understand and 
appreciate the amount of effort it takes just to keep the wheels 
spinning, so I want to keep expectations at a reasonable level.

Don't get me wrong, I absolutely agree with requiring third party CI 
for the various vendor-specific drivers and plugins, that's a 
no-brainer for openstack to scale.  I think it will just be very 
interesting to see the kinds of results coming out of all of these 
disconnected teams come icehouse-3.



Matt Riedemann

More information about the OpenStack-dev mailing list