[openstack-dev] [nova] fair standards for all hypervisor drivers

Daniel P. Berrange berrange at redhat.com
Tue Aug 12 09:54:26 UTC 2014


On Mon, Aug 11, 2014 at 03:25:39PM -0700, Joe Gordon wrote:
> I am not sure if I follow.  The gate issue with live snapshots has been
> worked around by turning it off [0], so presumably this patch is forward
> facing.  I fail to see how this patch is needed to help the gate in the
> future. Wouldn't it just delay the issues until we change the version_cap?

Consider that we have a feature already in tree that is not currently
tested by the gate. Now we update libvirt in the gate and so tempest
suddenly starts exercising the feature. Now if there is a bug, every
single review submitted to the gate is potentially going to crash and
burn causing major pain for everyone trying to get tests to pass. With
this version cap, you can update libvirt in the gate in knowledge that
we won't turn on new previously untested feature patches, so you have
lower risk of causing gate instability. Once the gate is updated to new
libvirt, we submit a patch to update version cap. If there is a bug in
the new features enabled it only affects that one patch under review
instead of killing the entire CI system for anyone. Only once we have
passing tests for the new version cap value and that is merged would
the gate as a whole be impact. Of course sometimes the bugs are very
non-deterministic and rare so things might still sneak through, but
at least some portion of bugs will be detected this way and help the
gate reliability during updates of libvirt.

> The issue I see with the libvirt version_cap [1] is best captured in its
> commit message: "The end user can override the limit if they wish to opt-in
> to use of untested features via the 'version_cap' setting in the 'libvirt'
> group." This goes against the very direction nova has been moving in for
> some time now. We have been moving away from merging untested (re: no
> integration testing) features.  This patch changes the very direction the
> project is going in over testing without so much as a discussion. While I
> think it may be time that we revisited this discussion, the discussion
> needs to happen before any patches are merged.

Like it or not we have a number of features in Nova that we don't have
test coverage for, due to a variety of reasons, some short term, some
long term, some permanently unavoidable. One of the reasons is due to
the gate having too old libvirt for a feature. As mentioned elsewhere
people are looking at addressing that, by trying to figure out how to
do a gate job with newer libvirt. Blocking feature development during
Juno until the gate issues are addressed is not going to help the work
to get new gate jobs, but will discourage our contributors and further
the (somewhat valid) impression that we're not a very welcoming project
to work with.

The version cap setting is *not* encouraging us to add more features that
lack testing. It is about recognising that we're *already* accepting such
features and so taking steps to ensure our end users don't exercise the
untested code paths unless they explicitly choose to. This ensures that
what the user tests out of the box actually meets our "Tier 1" status.

> I am less concerned about the contents of this patch, and more concerned
> with how such a big de facto change in nova policy (we accept untested code
> sometimes) without any discussion or consensus. In your comment on the
> revert [2], you say the 'whether not-CI-tested features should be allowed
> to be merged' debate is 'clearly unresolved.' How did you get to that
> conclusion? This was never brought up in the mid-cycles as a unresolved
> topic to be discussed. In our specs template we say "Is this untestable in
> gate given current limitations (specific hardware / software configurations
> available)? If so, are there mitigation plans (3rd party testing, gate
> enhancements, etc)" [3].  We have been blocking untested features for some
> time now.

That last lines are nonsense. We have never unconditionally blocked untested
features nor do I recommend that we do so. The specs template testing allows
the contributor to *justify* why they think the feature is worth accepting
despite lack of testing. The reviewers make a judgement call on whether the
justification is valid or not. This is a pragmmatic approach to the problem.

> I am further perplexed by what Daniel Berrange, the patch author, meant
> when he commented [2] "Regardless of the outcome of the testing discussion
> we believe this is a useful feature to have." Who is 'we'? Because I don't
> see how that can be nova-core or even nova-specs-core, especially
> considering how many members of those groups are +2 on the revert. So if
> 'we' is neither of those groups then who is 'we'?

By 'we' I'm referring to the people who submitted & approved the patch. As
explained soooooo many times now, this version cap concept is something that
is useful to end users even if this debate of testing was not happening and
libvirt had 100% testing coverage. ie consider we test on libvirt 1.2.0 but
a cloud admin has deployed on libvirt 1.0.0. Now the admin wants to pull in
libvirt 1.1.5 to get some critical bug fix, but they don't want their nova
deployment to start exercising new feature paths which can result in a change
in behaviour of their cloud. The version cap settings allows them to update
from 1.0.0 to 1.1.5, while telling nova to only use features from 1.0.0.

This patch has *no* bearing/impact on what we decide wrt policy for testing
of virt drivers and is demonstratably useful in production deployments. So
the idea that we must revert this due to lack of decision wrt testing policy
is bogus.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list