[openstack-dev] [nova] libvirt version_cap, a postmortem
Gary Kotton
gkotton at vmware.com
Sun Aug 31 09:36:04 UTC 2014
Hi,
Very nice write up. I take my hat off for taking the time and doing a
postmortem. As a community we really need to work on how we communicate
with one another. At the end of the day we all have the common goal in the
success of the project.
At times I feel like things are done very quickly without any discussion
at all, for example the reverting of patches. I can understand when the
gate is broken that this is a must, but in other cases I think that we
need to discuss things in order to build trust and for people in the
community to learn what they may or may not have done correctly.
Thanks
Gary
On 8/30/14, 7:08 PM, "Mark McLoughlin" <markmc at redhat.com> wrote:
>
>Hey
>
>The libvirt version_cap debacle continues to come up in conversation and
>one perception of the whole thing appears to be:
>
> A controversial patch was "ninjaed" by three Red Hat nova-cores and
> then the same individuals piled on with -2s when a revert was proposed
> to allow further discussion.
>
>I hope it's clear to everyone why that's a pretty painful thing to hear.
>However, I do see that I didn't behave perfectly here. I apologize for
>that.
>
>In order to understand where this perception came from, I've gone back
>over the discussions spread across gerrit and the mailing list in order
>to piece together a precise timeline. I've appended that below.
>
>Some conclusions I draw from that tedious exercise:
>
> - Some people came at this from the perspective that we already have
> a firm, unwritten policy that all code must have functional written
> tests. Others see that "test all the things" is interpreted as a
> worthy aspiration, but is only one of a number of nuanced factors
> that needs to be taken into account when considering the addition of
> a new feature.
>
> i.e. the former camp saw Dan Smith's devref addition as attempting
> to document an existing policy (perhaps even a more forgiving
> version of an existing policy), whereas other see it as a dramatic
> shift to a draconian implementation of "test all the things".
>
> - Dan Berrange, Russell and I didn't feel like we were "ninjaing a
> controversial patch" - you can see our perspective expressed in
> multiple places. The patch would have helped the "live snapshot"
> issue, and has other useful applications. It does not affect the
> broader testing debate.
>
> Johannes was a solitary voice expressing concerns with the patch,
> and you could see that Dan was particularly engaged in trying to
> address those concerns and repeating his feeling that the patch was
> orthogonal to the testing debate.
>
> That all being said - the patch did merge too quickly.
>
> - What exacerbates the situation - particularly when people attempt to
> look back at what happened - is how spread out our conversations
> are. You look at the version_cap review and don't see any of the
> related discussions on the devref policy review nor the mailing list
> threads. Our disjoint methods of communicating contribute to
> misunderstandings.
>
> - When it came to the revert, a couple of things resulted in
> misunderstandings, hurt feelings and frayed tempers - (a) that our
> "retrospective veto revert policy" wasn't well understood and (b)
> a feeling that there was private, in-person grumbling about us at
> the mid-cycle while we were absent, with no attempt to talk to us
> directly.
>
>
>To take an even further step back - successful communities like ours
>require a huge amount of trust between the participants. Trust requires
>communication and empathy. If communication breaks down and the pressure
>we're all under erodes our empathy for each others' positions, then
>situations can easily get horribly out of control.
>
>This isn't a pleasant situation and we should all strive for better.
>However, I tend to measure our "flamewars" against this:
>
>
>https://urldefense.proofpoint.com/v1/url?u=https://mail.gnome.org/archives
>/gnome-2-0-list/2001-June/msg00132.html&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0
>A&r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0A&m=WnWRoI3TSZlh7lPB
>Z67S7KqCg6LUo1tMHirwwCXEY0o%3D%0A&s=cb0293d31053f67f603946a43e6ebb99333df6
>d73adeb6d3965380aa94424519
>
>GNOME in June 2001 was my introduction to full-time open-source
>development, so this episode sticks out in my mind. The two individuals
>in that email were/are immensely capable and reasonable people, yet ...
>
>So far, we're doing pretty okay compared to that and many other
>open-source flamewars. Let's make sure we continue that way by avoiding
>letting situations fester.
>
>
>Thanks, and sorry for being a windbag,
>Mark.
>
>---
>
>= July 1 =
>
>The starting point is this review:
>
> https://review.openstack.org/103923
>
>Dan Smith proposes a policy that the libvirt driver may not use libvirt
>features until they have been available in Ubuntu or Fedora for at least
>30 days.
>
>The commit message mentions:
>
> "broken us in the past when we add a new feature that requires a newer
> libvirt than we test with, and we discover that it's totally broken
> when we upgrade in the gate."
>
>which AIUI is a reference to the libvirt "live snapshot" issue the
>previous week, which is described here:
>
> https://review.openstack.org/102643
>
>where upgrading to Ubuntu Trusty meant the libvirt version in use in the
>gate went from 0.9.8 to 1.2.2, which caused the "live snapshot" code
>paths in Nova for the first time, which appeared to be related to some
>serious gate instability (although the exact root cause wasn't
>identified).
>
>Some background on the libvirt version upgrade can be seen here:
>
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#
>30284
>
>= July 1 - July 8 =
>
>Back and forth debate mostly between Dan Smith and Dan Berrange. Sean
>votes +2, Dan Berrange votes -2.
>
>= July 14 =
>
>Russell adds his support to Dan Berrange's position, votes -2. Some
>debate between Dan and Dan continues. Joe Gordon votes +2. Matt
>Riedemann expresses support-in-principal for Dan Smith's approach.
>
>= July 15 =
>
>Debate continues ...
>
>16:12 - I -2 the patch and attempt to take a step back and think about
>how we could have prevented (or at least mitigated against) the "live
>snapshot" issue and suggest the idea of adding a new configuration
>option:
>
> [libvirt]
> version_cap = 1.2.2
>
>which would mean we would not automatically start using new libvirt
>features in the gate because of a libvirt version upgrade, but instead
>the new features would only begin to be used when we merge a change to
>the default value of version_cap.
>
>16:31 - I leave a separate comment addressing the broader debate about
>our functional test coverage requirements.
>
>16:46 - Dan Berrange likes the version_cap idea
>
>15:37 - Dan Berrange posts an implementation of version_cap:
>
> https://review.openstack.org/107119
>
>and links to it from in Dan Smith's libvirt testing policy review
>(#103923)
>
>21:49 - Matt expresses some support for the config option, but worries
>about the precedent being set.
>
>23:14 - Dan Berrange explains his point of view that a "test all the
>things" rule must mean "test all the things which can be practically
>tested by our current CI system".
>
>= July 16 =
>
>08:04 - I +2 the version_cap patch after Dan fixes up some issues I
>pointed out.
>
>13:44 - 14:28 - Sean and John Garbutt add further thoughts to the
>libvirt testing policy review without making any comment on the
>version_cap idea. Sean takes the debate to the mailing list:
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040421.html
>
>Debate continues in the thread, largely around the mechanics of how to
>allow a newer version of libvirt be used in the gate.
>
>15:08 - I mention the version_cap proposal on the thread for the first
>time:
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-July/040436.html
>
>and the point I make is the configuration option makes it easier for
>operators to run only code paths that are tested by the gate.
>
>16:44 - Johannes notes that multiple issues with code paths not tested
>in the gate may need to be fixed as part of a future review to increase
>the default value of version_cap.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040456.html
>
>18:50 - Russell approves the version_cap patch.
>
>https://review.openstack.org/107119
>
>= July 17 =
>
>05:38 - The version_cap patch merges.
>
>13:09 - Somewhat related, Dan Berrange and I explain we won't be at the
>mid-cycle issue for any test policy discussions. Sean makes a point that
>the discussion is best had on email/IRC where there is a permanent
>record.
>
>14:33 - Johannes expresses concern in gerrit that version_cap got merged
>too quickly.
>
>https://review.openstack.org/107119
>
>15:17 - Dan Berrange responds to Johannes in gerrit, saying that he
>thinks version_cap is useful irrespective of the broader testing
>discussion.
>
>15:28 - Johannes disagrees, asks for a response to his concerns on the
>mailing list.
>
>15:40 - Dan Berrange responds on the mailing list to Johannes
>version_cap concerns.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040576.html
>
>18:15 - Russell also responds to Johannes.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040597.html
>
>18:31 - Johannes responds to Dan.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040602.html
>
>18:39 - Russell responds to Johannes again.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040604.html
>
>19:13 - Johannes responds again.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040608.html
>
>= July 18 =
>
>07:55 - Dan Berrange responds to Johannes again.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040647.html
>
>09:37 - Sean notices version_cap, thinks it's "interesting", wonders do
>we need similar for qemu versions.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-July/040533.html
>
>= July 28 - 30 =
>
>The Nova mid-cycle meetup in Portland. Neither, Dan, Russell nor I are
>present, for various personal reasons.
>
>AIUI, a brief "hallway track" conversation lead to briefly discussing
>the version_cap patch in the main sessions which went along the lines of
>"does anyone find it concerning how this patch was merged" and the
>general consensus was "yes, it is concerning".
>
>= July 30 =
>
>Sean Dague proposes a revert of the patch because a) "it's a policy
>change" and b) it requires further discussion.
>
>https://review.openstack.org/110754
>
>= Aug 7 =
>
>Russell -2s the revert.
>
>Matt raises the topic of the revert on the mailing list.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042284.html
>
>Dan Smith, Joe Gordon +2s the revert.
>
>= Aug 8 =
>
>Michael Still, Jay Pipes +2s the revert.
>
>Russell starts experimenting with a gate job to test with latest
>libvirt.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042470.html
>
>= Aug 11 =
>
>Dan Berrange and I return from our (separate :) vacations and -2 the
>revert.
>
>I describe the revert as a "proxy battle", avoiding the real debate
>around testing requirements.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042546.html
>
>Joe tries to capture the concerns about how the version_cap patch was
>originally merged.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042653.html
>
>= Aug 12 =
>
>I explicitly attempt to address any notion there is a "corporate agenda"
>at play here.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042691.html
>
>Dan again tries to explain why he thinks version_cap is orthogonal to
>the broader debate.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042707.html
>
>Michael Still - with Thierry cc-ed - privately emails me, Russell and
>Dan Berrange asking us to remove our -2s so that we can have a "more
>complete discussion". All three of us initially refuse and a bunch of
>hurt feelings are frankly expressed on all sides, with one theme being
>the perception that this had gotten out of control because we weren't at
>the mid-cycle.
>
>Thierry calmed things down be patiently expressing Michael's perspective
>that there is a process to be followed where patches that get merged
>quickly are later seen to be controversial. The red mist begins to
>clear. I propose that we document this "retrospective veto revert"
>procedure to head off any future misunderstanding.
>
>http://lists.openstack.org/pipermail/openstack-dev/2014-August/042728.html
>
>All three of us remove our -2s. The patch gets reverted.
>
>= Aug 14 =
>
>Dan Berrange re-proposes the patch, later changing the default
>version_cap value.
>
>https://review.openstack.org/114307
>
>The patch has seen very little discussion since then and is still
>pending review.
>
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list