[openstack-dev] [nova] libvirt version_cap, a postmortem
Mark McLoughlin
markmc at redhat.com
Sat Aug 30 16:08:16 UTC 2014
Hey
The libvirt version_cap debacle continues to come up in conversation and
one perception of the whole thing appears to be:
A controversial patch was "ninjaed" by three Red Hat nova-cores and
then the same individuals piled on with -2s when a revert was proposed
to allow further discussion.
I hope it's clear to everyone why that's a pretty painful thing to hear.
However, I do see that I didn't behave perfectly here. I apologize for
that.
In order to understand where this perception came from, I've gone back
over the discussions spread across gerrit and the mailing list in order
to piece together a precise timeline. I've appended that below.
Some conclusions I draw from that tedious exercise:
- Some people came at this from the perspective that we already have
a firm, unwritten policy that all code must have functional written
tests. Others see that "test all the things" is interpreted as a
worthy aspiration, but is only one of a number of nuanced factors
that needs to be taken into account when considering the addition of
a new feature.
i.e. the former camp saw Dan Smith's devref addition as attempting
to document an existing policy (perhaps even a more forgiving
version of an existing policy), whereas other see it as a dramatic
shift to a draconian implementation of "test all the things".
- Dan Berrange, Russell and I didn't feel like we were "ninjaing a
controversial patch" - you can see our perspective expressed in
multiple places. The patch would have helped the "live snapshot"
issue, and has other useful applications. It does not affect the
broader testing debate.
Johannes was a solitary voice expressing concerns with the patch,
and you could see that Dan was particularly engaged in trying to
address those concerns and repeating his feeling that the patch was
orthogonal to the testing debate.
That all being said - the patch did merge too quickly.
- What exacerbates the situation - particularly when people attempt to
look back at what happened - is how spread out our conversations
are. You look at the version_cap review and don't see any of the
related discussions on the devref policy review nor the mailing list
threads. Our disjoint methods of communicating contribute to
misunderstandings.
- When it came to the revert, a couple of things resulted in
misunderstandings, hurt feelings and frayed tempers - (a) that our
"retrospective veto revert policy" wasn't well understood and (b)
a feeling that there was private, in-person grumbling about us at
the mid-cycle while we were absent, with no attempt to talk to us
directly.
To take an even further step back - successful communities like ours
require a huge amount of trust between the participants. Trust requires
communication and empathy. If communication breaks down and the pressure
we're all under erodes our empathy for each others' positions, then
situations can easily get horribly out of control.
This isn't a pleasant situation and we should all strive for better.
However, I tend to measure our "flamewars" against this:
https://mail.gnome.org/archives/gnome-2-0-list/2001-June/msg00132.html
GNOME in June 2001 was my introduction to full-time open-source
development, so this episode sticks out in my mind. The two individuals
in that email were/are immensely capable and reasonable people, yet ...
So far, we're doing pretty okay compared to that and many other
open-source flamewars. Let's make sure we continue that way by avoiding
letting situations fester.
Thanks, and sorry for being a windbag,
Mark.
---
= July 1 =
The starting point is this review:
https://review.openstack.org/103923
Dan Smith proposes a policy that the libvirt driver may not use libvirt
features until they have been available in Ubuntu or Fedora for at least
30 days.
The commit message mentions:
"broken us in the past when we add a new feature that requires a newer
libvirt than we test with, and we discover that it's totally broken
when we upgrade in the gate."
which AIUI is a reference to the libvirt "live snapshot" issue the
previous week, which is described here:
https://review.openstack.org/102643
where upgrading to Ubuntu Trusty meant the libvirt version in use in the
gate went from 0.9.8 to 1.2.2, which caused the "live snapshot" code
paths in Nova for the first time, which appeared to be related to some
serious gate instability (although the exact root cause wasn't
identified).
Some background on the libvirt version upgrade can be seen here:
http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#30284
= July 1 - July 8 =
Back and forth debate mostly between Dan Smith and Dan Berrange. Sean
votes +2, Dan Berrange votes -2.
= July 14 =
Russell adds his support to Dan Berrange's position, votes -2. Some
debate between Dan and Dan continues. Joe Gordon votes +2. Matt
Riedemann expresses support-in-principal for Dan Smith's approach.
= July 15 =
Debate continues ...
16:12 - I -2 the patch and attempt to take a step back and think about
how we could have prevented (or at least mitigated against) the "live
snapshot" issue and suggest the idea of adding a new configuration
option:
[libvirt]
version_cap = 1.2.2
which would mean we would not automatically start using new libvirt
features in the gate because of a libvirt version upgrade, but instead
the new features would only begin to be used when we merge a change to
the default value of version_cap.
16:31 - I leave a separate comment addressing the broader debate about
our functional test coverage requirements.
16:46 - Dan Berrange likes the version_cap idea
15:37 - Dan Berrange posts an implementation of version_cap:
https://review.openstack.org/107119
and links to it from in Dan Smith's libvirt testing policy review (#103923)
21:49 - Matt expresses some support for the config option, but worries
about the precedent being set.
23:14 - Dan Berrange explains his point of view that a "test all the
things" rule must mean "test all the things which can be practically
tested by our current CI system".
= July 16 =
08:04 - I +2 the version_cap patch after Dan fixes up some issues I
pointed out.
13:44 - 14:28 - Sean and John Garbutt add further thoughts to the
libvirt testing policy review without making any comment on the
version_cap idea. Sean takes the debate to the mailing list:
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040421.html
Debate continues in the thread, largely around the mechanics of how to
allow a newer version of libvirt be used in the gate.
15:08 - I mention the version_cap proposal on the thread for the first
time:
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040436.html
and the point I make is the configuration option makes it easier for
operators to run only code paths that are tested by the gate.
16:44 - Johannes notes that multiple issues with code paths not tested
in the gate may need to be fixed as part of a future review to increase
the default value of version_cap.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040456.html
18:50 - Russell approves the version_cap patch.
https://review.openstack.org/107119
= July 17 =
05:38 - The version_cap patch merges.
13:09 - Somewhat related, Dan Berrange and I explain we won't be at the
mid-cycle issue for any test policy discussions. Sean makes a point that
the discussion is best had on email/IRC where there is a permanent
record.
14:33 - Johannes expresses concern in gerrit that version_cap got merged
too quickly.
https://review.openstack.org/107119
15:17 - Dan Berrange responds to Johannes in gerrit, saying that he
thinks version_cap is useful irrespective of the broader testing
discussion.
15:28 - Johannes disagrees, asks for a response to his concerns on the
mailing list.
15:40 - Dan Berrange responds on the mailing list to Johannes
version_cap concerns.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040576.html
18:15 - Russell also responds to Johannes.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040597.html
18:31 - Johannes responds to Dan.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040602.html
18:39 - Russell responds to Johannes again.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040604.html
19:13 - Johannes responds again.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040608.html
= July 18 =
07:55 - Dan Berrange responds to Johannes again.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040647.html
09:37 - Sean notices version_cap, thinks it's "interesting", wonders do
we need similar for qemu versions.
http://lists.openstack.org/pipermail/openstack-dev/2014-July/040533.html
= July 28 - 30 =
The Nova mid-cycle meetup in Portland. Neither, Dan, Russell nor I are
present, for various personal reasons.
AIUI, a brief "hallway track" conversation lead to briefly discussing
the version_cap patch in the main sessions which went along the lines of
"does anyone find it concerning how this patch was merged" and the
general consensus was "yes, it is concerning".
= July 30 =
Sean Dague proposes a revert of the patch because a) "it's a policy
change" and b) it requires further discussion.
https://review.openstack.org/110754
= Aug 7 =
Russell -2s the revert.
Matt raises the topic of the revert on the mailing list.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042284.html
Dan Smith, Joe Gordon +2s the revert.
= Aug 8 =
Michael Still, Jay Pipes +2s the revert.
Russell starts experimenting with a gate job to test with latest
libvirt.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042470.html
= Aug 11 =
Dan Berrange and I return from our (separate :) vacations and -2 the
revert.
I describe the revert as a "proxy battle", avoiding the real debate
around testing requirements.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042546.html
Joe tries to capture the concerns about how the version_cap patch was
originally merged.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042653.html
= Aug 12 =
I explicitly attempt to address any notion there is a "corporate agenda"
at play here.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042691.html
Dan again tries to explain why he thinks version_cap is orthogonal to
the broader debate.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042707.html
Michael Still - with Thierry cc-ed - privately emails me, Russell and
Dan Berrange asking us to remove our -2s so that we can have a "more
complete discussion". All three of us initially refuse and a bunch of
hurt feelings are frankly expressed on all sides, with one theme being
the perception that this had gotten out of control because we weren't at
the mid-cycle.
Thierry calmed things down be patiently expressing Michael's perspective
that there is a process to be followed where patches that get merged
quickly are later seen to be controversial. The red mist begins to
clear. I propose that we document this "retrospective veto revert"
procedure to head off any future misunderstanding.
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042728.html
All three of us remove our -2s. The patch gets reverted.
= Aug 14 =
Dan Berrange re-proposes the patch, later changing the default
version_cap value.
https://review.openstack.org/114307
The patch has seen very little discussion since then and is still
pending review.
More information about the OpenStack-dev
mailing list