[openstack-dev] [nova] libvirt version_cap, a postmortem

Joe Gordon joe.gordon0 at gmail.com
Wed Sep 3 20:57:54 UTC 2014


On Sat, Aug 30, 2014 at 9:08 AM, Mark McLoughlin <markmc at redhat.com> wrote:

>
> Hey
>
> The libvirt version_cap debacle continues to come up in conversation and
> one perception of the whole thing appears to be:
>
>   A controversial patch was "ninjaed" by three Red Hat nova-cores and
>   then the same individuals piled on with -2s when a revert was proposed
>   to allow further discussion.
>
> I hope it's clear to everyone why that's a pretty painful thing to hear.
> However, I do see that I didn't behave perfectly here. I apologize for
> that.
>
> In order to understand where this perception came from, I've gone back
> over the discussions spread across gerrit and the mailing list in order
> to piece together a precise timeline. I've appended that below.
>
> Some conclusions I draw from that tedious exercise:
>

Thank you for going through and doing this.


>
>  - Some people came at this from the perspective that we already have
>    a firm, unwritten policy that all code must have functional written
>    tests. Others see that "test all the things" is interpreted as a
>    worthy aspiration, but is only one of a number of nuanced factors
>    that needs to be taken into account when considering the addition of
>    a new feature.
>

Confusion over our testing policy sounds like the crux of one of the issues
here. Having so many unwritten policies has led to confusion in the past
which is why I started
http://docs.openstack.org/developer/nova/devref/policies.html, hopefully by
writing these things down in the future this sort of confusion will arise
less often.

Until this whole debacle I didn't even know there was a dissenting opinion
on what our testing policy is. In every conversation I have seen up until
this point, the question was always how to raise the bar on testing.  I
don't expect us to be able to get to the bottom of this issue in a ML
thread, but hopefully we can begin the testing policy conversation here so
that we may be able to make a breakthrough and the summit.



>
>    i.e. the former camp saw Dan Smith's devref addition as attempting
>    to document an existing policy (perhaps even a more forgiving
>    version of an existing policy), whereas other see it as a dramatic
>    shift to a draconian implementation of "test all the things".
>
>  - Dan Berrange, Russell and I didn't feel like we were "ninjaing a
>    controversial patch" - you can see our perspective expressed in
>    multiple places. The patch would have helped the "live snapshot"
>    issue, and has other useful applications. It does not affect the
>    broader testing debate.
>
>    Johannes was a solitary voice expressing concerns with the patch,
>    and you could see that Dan was particularly engaged in trying to
>    address those concerns and repeating his feeling that the patch was
>    orthogonal to the testing debate.
>
>    That all being said - the patch did merge too quickly.
>
>  - What exacerbates the situation - particularly when people attempt to
>    look back at what happened - is how spread out our conversations
>    are. You look at the version_cap review and don't see any of the
>    related discussions on the devref policy review nor the mailing list
>    threads. Our disjoint methods of communicating contribute to
>    misunderstandings.
>
>  - When it came to the revert, a couple of things resulted in
>    misunderstandings, hurt feelings and frayed tempers - (a) that our
>    "retrospective veto revert policy" wasn't well understood and (b)
>    a feeling that there was private, in-person grumbling about us at
>    the mid-cycle while we were absent, with no attempt to talk to us
>    directly.
>

While I cannot speak for anyone else, I did grumble a bit at the mid-cycle
about the behavior on Dan's first devref patch,
https://review.openstack.org/#/c/103923/. This was the first time I saw 3
'-2's on a single patch revision. To me 1 or 2 '-2's gives the perception
of 'hold on there, lets discuss this more first,' but 3 '-2's is just
piling on and is very confrontational in nature. I was taken aback by this
behavior and still don't know what to say or even if my reaction is
justified.


>
> To take an even further step back - successful communities like ours
> require a huge amount of trust between the participants. Trust requires
> communication and empathy. If communication breaks down and the pressure
> we're all under erodes our empathy for each others' positions, then
> situations can easily get horribly out of control.
>
> This isn't a pleasant situation and we should all strive for better.
> However, I tend to measure our "flamewars" against this:
>
>   https://mail.gnome.org/archives/gnome-2-0-list/2001-June/msg00132.html
>
> GNOME in June 2001 was my introduction to full-time open-source
> development, so this episode sticks out in my mind. The two individuals
> in that email were/are immensely capable and reasonable people, yet ...
>
> So far, we're doing pretty okay compared to that and many other
> open-source flamewars. Let's make sure we continue that way by avoiding
> letting situations fester.
>
>
> Thanks, and sorry for being a windbag,
> Mark.
>
> ---
>
> = July 1 =
>
> The starting point is this review:
>
>    https://review.openstack.org/103923
>
> Dan Smith proposes a policy that the libvirt driver may not use libvirt
> features until they have been available in Ubuntu or Fedora for at least
> 30 days.
>
> The commit message mentions:
>
>   "broken us in the past when we add a new feature that requires a newer
>    libvirt than we test with, and we discover that it's totally broken
>    when we upgrade in the gate."
>
> which AIUI is a reference to the libvirt "live snapshot" issue the
> previous week, which is described here:
>
>   https://review.openstack.org/102643
>
> where upgrading to Ubuntu Trusty meant the libvirt version in use in the
> gate went from 0.9.8 to 1.2.2, which caused the "live snapshot" code
> paths in Nova for the first time, which appeared to be related to some
> serious gate instability (although the exact root cause wasn't
> identified).
>
> Some background on the libvirt version upgrade can be seen here:
>
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#30284
>
> = July 1 - July 8 =
>
> Back and forth debate mostly between Dan Smith and Dan Berrange. Sean
> votes +2, Dan Berrange votes -2.
>
> = July 14 =
>
> Russell adds his support to Dan Berrange's position, votes -2. Some
> debate between Dan and Dan continues. Joe Gordon votes +2. Matt
> Riedemann expresses support-in-principal for Dan Smith's approach.
>
> = July 15 =
>
> Debate continues ...
>
> 16:12 - I -2 the patch and attempt to take a step back and think about
> how we could have prevented (or at least mitigated against) the "live
> snapshot" issue and suggest the idea of adding a new configuration
> option:
>
>   [libvirt]
>   version_cap = 1.2.2
>
> which would mean we would not automatically start using new libvirt
> features in the gate because of a libvirt version upgrade, but instead
> the new features would only begin to be used when we merge a change to
> the default value of version_cap.
>
> 16:31 - I leave a separate comment addressing the broader debate about
> our functional test coverage requirements.
>
> 16:46 - Dan Berrange likes the version_cap idea
>
> 15:37 - Dan Berrange posts an implementation of version_cap:
>
>   https://review.openstack.org/107119
>
> and links to it from in Dan Smith's libvirt testing policy review (#103923)
>
> 21:49 - Matt expresses some support for the config option, but worries
> about the precedent being set.
>
> 23:14 - Dan Berrange explains his point of view that a "test all the
> things" rule must mean "test all the things which can be practically
> tested by our current CI system".
>
> = July 16 =
>
> 08:04 - I +2 the version_cap patch after Dan fixes up some issues I
> pointed out.
>
> 13:44 - 14:28 - Sean and John Garbutt add further thoughts to the
> libvirt testing policy review without making any comment on the
> version_cap idea. Sean takes the debate to the mailing list:
>
>   http://lists.openstack.org/pipermail/openstack-dev/2014-July/040421.html
>
> Debate continues in the thread, largely around the mechanics of how to
> allow a newer version of libvirt be used in the gate.
>
> 15:08 - I mention the version_cap proposal on the thread for the first
> time:
>
>   http://lists.openstack.org/pipermail/openstack-dev/2014-July/040436.html
>
> and the point I make is the configuration option makes it easier for
> operators to run only code paths that are tested by the gate.
>
> 16:44 - Johannes notes that multiple issues with code paths not tested
> in the gate may need to be fixed as part of a future review to increase
> the default value of version_cap.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040456.html
>
> 18:50 - Russell approves the version_cap patch.
>
> https://review.openstack.org/107119
>
> = July 17 =
>
> 05:38 - The version_cap patch merges.
>
> 13:09 - Somewhat related, Dan Berrange and I explain we won't be at the
> mid-cycle issue for any test policy discussions. Sean makes a point that
> the discussion is best had on email/IRC where there is a permanent
> record.
>
> 14:33 - Johannes expresses concern in gerrit that version_cap got merged
> too quickly.
>
> https://review.openstack.org/107119
>
> 15:17 - Dan Berrange responds to Johannes in gerrit, saying that he
> thinks version_cap is useful irrespective of the broader testing
> discussion.
>
> 15:28 - Johannes disagrees, asks for a response to his concerns on the
> mailing list.
>
> 15:40 - Dan Berrange responds on the mailing list to Johannes
> version_cap concerns.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040576.html
>
> 18:15 - Russell also responds to Johannes.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040597.html
>
> 18:31 - Johannes responds to Dan.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040602.html
>
> 18:39 - Russell responds to Johannes again.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040604.html
>
> 19:13 - Johannes responds again.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040608.html
>
> = July 18 =
>
> 07:55 - Dan Berrange responds to Johannes again.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040647.html
>
> 09:37 - Sean notices version_cap, thinks it's "interesting", wonders do
> we need similar for qemu versions.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040533.html
>
> = July 28 - 30 =
>
> The Nova mid-cycle meetup in Portland. Neither, Dan, Russell nor I are
> present, for various personal reasons.
>
> AIUI, a brief "hallway track" conversation lead to briefly discussing
> the version_cap patch in the main sessions which went along the lines of
> "does anyone find it concerning how this patch was merged" and the
> general consensus was "yes, it is concerning".
>
> = July 30 =
>
> Sean Dague proposes a revert of the patch because a) "it's a policy
> change" and b) it requires further discussion.
>
> https://review.openstack.org/110754
>
> = Aug 7 =
>
> Russell -2s the revert.
>
> Matt raises the topic of the revert on the mailing list.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042284.html
>
> Dan Smith, Joe Gordon +2s the revert.
>
> = Aug 8 =
>
> Michael Still, Jay Pipes +2s the revert.
>
> Russell starts experimenting with a gate job to test with latest
> libvirt.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042470.html
>
> = Aug 11 =
>
> Dan Berrange and I return from our (separate :) vacations and -2 the
> revert.
>
> I describe the revert as a "proxy battle", avoiding the real debate
> around testing requirements.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042546.html
>
> Joe tries to capture the concerns about how the version_cap patch was
> originally merged.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042653.html
>
> = Aug 12 =
>
> I explicitly attempt to address any notion there is a "corporate agenda"
> at play here.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042691.html
>
> Dan again tries to explain why he thinks version_cap is orthogonal to
> the broader debate.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042707.html
>
> Michael Still - with Thierry cc-ed - privately emails me, Russell and
> Dan Berrange asking us to remove our -2s so that we can have a "more
> complete discussion". All three of us initially refuse and a bunch of
> hurt feelings are frankly expressed on all sides, with one theme being
> the perception that this had gotten out of control because we weren't at
> the mid-cycle.
>
> Thierry calmed things down be patiently expressing Michael's perspective
> that there is a process to be followed where patches that get merged
> quickly are later seen to be controversial. The red mist begins to
> clear. I propose that we document this "retrospective veto revert"
> procedure to head off any future misunderstanding.
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-August/042728.html
>
> All three of us remove our -2s. The patch gets reverted.
>
> = Aug 14 =
>
> Dan Berrange re-proposes the patch, later changing the default
> version_cap value.
>
> https://review.openstack.org/114307
>
> The patch has seen very little discussion since then and is still
> pending review.
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140903/b252f220/attachment-0001.html>


More information about the OpenStack-dev mailing list