[openstack-dev] [nova] fair standards for all hypervisor drivers

Daniel P. Berrange berrange at redhat.com
Wed Jul 16 14:50:20 UTC 2014


On Wed, Jul 16, 2014 at 04:15:40PM +0200, Sean Dague wrote:
> Recently the main gate updated from Ubuntu 12.04 to 14.04, and in doing
> so we started executing the livesnapshot code in the nova libvirt
> driver. Which fails about 20% of the time in the gate, as we're bringing
> computes up and down while doing a snapshot. Dan Berange did a bunch of
> debug on that and thinks it might be a qemu bug. We disabled these code
> paths, so live snapshot has now been ripped out.
> 
> In January we also triggered a libvirt bug, and had to carry a private
> build of libvirt for 6 weeks in order to let people merge code in OpenStack.
> 
> We never were able to switch to libvirt 1.1.1 in the gate using the
> Ubuntu Cloud Archive during Icehouse development, because it has a
> different set of failures that would have prevented people from merging
> code.
> 
> Based on these experiences, libvirt version differences seem to be as
> substantial as major hypervisor differences.

I think that is a pretty dubious conclusion to draw from just a
couple of bugs. The reason they really caused pain is that because
the CI test system was based on old version for too long. If it
were tracking current upstream version of libvirt/KVM we'd have
seen the problem much sooner & been able to resolve it during
review of the change introducing the feature, as we do with any
other bugs we encounter in software such as the breakage we see
with my stuff off pypi.

>                                             There is a proposal here -
> https://review.openstack.org/#/c/103923/ to hold newer versions of
> libvirt to the same standard we hold xen, vmware, hyperv, docker,
> ironic, etc.

That is rather misleading statement you're making there. Libvirt is
in fact held to *higher* standards than xen/vmware/hypver because it
is actually gating all commits. The 3rd party CI systems can be
broken for days, weeks and we still happily accept code for those
virt. drivers.

AFAIK there has never been any statement that every feature added
to xen/vmware/hyperv must be tested by the 3rd party CI system.
All of the CI systems, for whatever driver, are currently testing
some arbitrary subset of the overall features of that driver, and
by no means every new feature being approved in review has coverage.

> I'm somewhat concerned that the -2 pile on in this review is a double
> standard of libvirt features, and features exploiting really new
> upstream features. I feel like a lot of the language being used here
> about the burden of doing this testing is exactly the same as was
> presented by the docker team before their driver was removed, which was
> ignored by the Nova team at the time. It was the concern by the freebsd
> team, which was also ignored and they were told to go land libvirt
> patches instead.

As above the only double standard is that libvirt tests are all gating
and 3rd party tests are non-gating. 

> If we want to reduce the standards for libvirt we should reconsider
> what's being asked of 3rd party CI teams, and things like the docker
> driver, as well as the A, B, C driver classification. Because clearly
> libvirt 1.2.5+ isn't actually class A supported.

AFAIK the requirement for 3rd party CI is merely that it has to exist,
running some arbitrary version of the hypervisor in question. We've
not said that 3rd party CI has to be covering every version or every
feature, as is trying to be pushed on libvirt here.

The "Class A", "Class B", "Class C" classifications were always only
ever going to be a crude approximation. Unless you define them to be
wrt the explicit version of every single deb/pypi package installed
in the gate system (which I don't believe anyone has every suggested)
there is always risk that a different version of some package has a
bug that Nova tickles.

IMHO the classification we do for drivers provides an indication as 
to the quality of the *Nova* code. IOW class A indicates that we've
throughly tested the Nova code and believe it to be free of bugs for
the features we've tested. If there is a bug in a 3rd party package
that doesn't imply that the Nova code is any less well tested or
more buggy. Replace libvirt with mysql in your example above. A new
version of mysql with a bug does not imply that Nova is suddenly not
"class A" tested.

IMHO it is upto the downstream vendors to run testing to ensure that
what they give to their customers, still achieves the quality level
indicated by the tests upstream has performed on the Nova code.

> Anyway, discussion welcomed. My primary concern right now isn't actually
> where we set the bar, but that we set the same bar for everyone.

As above, aside from the question of gating vs non-gating, the bar is
already set at the same level of everyone. There has to be a CI system
somewhere testing some arbitrary version of the software. Everyone meets
that requirement.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list