[openstack-dev] [stable] juno is fubar in the gate
David Kranz
dkranz at redhat.com
Tue Feb 10 16:50:28 UTC 2015
On 02/10/2015 10:35 AM, Matthew Treinish wrote:
> On Tue, Feb 10, 2015 at 11:19:20AM +0100, Thierry Carrez wrote:
>> Joe, Matt & Matthew:
>>
>> I hear your frustration with broken stable branches. With my
>> vulnerability management team member hat, responsible for landing
>> patches there with a strict deadline, I can certainly relate with the
>> frustration of having to dive in to unbork the branch in the first
>> place, rather than concentrate on the work you initially planned on doing.
>>
>> That said, wearing my stable team member hat, I think it's a bit unfair
>> to say that things are worse than they were and call for dramatic
>> action. The stable branch team put a structure in place to try to
>> continuously fix the stable branches rather than reactively fix it when
>> we need it to work. Those champions have been quite active[1] unbreaking
>> it in the past months. I'd argue that the branch is broken much less
>> often than it used to. That doesn't mean it's never broken, though, or
>> that those people are magicians.
> I don't at all for 2 reasons. The first being in every discussion we had at 2
> summits I raised the increased maint. burden for a longer support window and
> was told that people were going to stand up so it wouldn't be an issue. I have
> yet to see that happen. I have not seen anything to date that would convince
> me that we are at all ready to be maintaining 3 stable branches at once.
>
> The second is while I've seen that etherpad, I still view their still being a
> huge disconnect here about what actually maintaining the branches requires. The
> issue which I'm raising is about issues related to the gating infrastructure and
> how to ensure that things stay working. There is a non-linear overhead involved
> with making sure any gating job stays working. (on stable or master) People need
> to take ownership of jobs to make sure they keep working.
>
>> One issue in the current situation is that the two groups (you and the
>> stable maintainers) seem to work in parallel rather than collaborate.
>> It's quite telling that the two groups maintained separate etherpads to
>> keep track of the fixes that needed landing.
> I don't actually view it as that. Just looking at the etherpad it has a very
> small subset of the actual types of issues we're raising here.
>
> For example, there was a week in late Nov. when 2 consecutive oslo project
> releases broke the stable gates. After we unwound all of this and landed the
> fixes in the branches the next step was to changes to make sure we didn't allow
> breakages in the same way:
>
> http://lists.openstack.org/pipermail/openstack-dev/2014-November/051206.html
>
> This was also happened at the same time as a new testtools stack release which
> broke every branch (including master). Another example is all of the setuptools
> stack churn from the famed Christmas releases. That was another critical
> infrastructure piece that fell apart and was mostly handled by the infra team.
> All of these things are getting fixed because they have to be, to make sure
> development on master can continue not because those with a vested interest in
> the stable branches working for 15 months are working on them.
>
> The other aspect here are development efforts to make things more stable in this
> space. Things like the effort to pin the requirements on stable branches which
> Joe is spearheading. These are critical to the long term success of the stable
> branches yet no one has stepped up to help with it.
>
> I view this as a disconnect between what people think maintaining a stable
> branch means and what it actually entails. Sure, the backporting of fixes to
> intermittent failures is part of it. But, the most effort is spent on making
> sure the gating machinery stays well oiled and doesn't breakdown.
>
>> [1] https://etherpad.openstack.org/p/stable-tracker
>>
>> Matthew Treinish wrote:
>>> So I think it's time we called the icehouse branch and marked it EOL. We
>>> originally conditioned the longer support window on extra people stepping
>>> forward to keep things working. I believe this latest issue is just the latest
>>> indication that this hasn't happened. Issue 1 listed above is being caused by
>>> the icehouse branch during upgrades. The fact that a stable release was pushed
>>> at the same time things were wedged on the juno branch is just the latest
>>> evidence to me that things aren't being maintained as they should be. Looking at
>>> the #openstack-qa irc log from today or the etherpad about trying to sort this
>>> issue should be an indication that no one has stepped up to help with the
>>> maintenance and it shows given the poor state of the branch.
>> I disagree with the assessment. People have stepped up. I think the
>> stable branches are less often broken than they were, and stable branch
>> champions (as their tracking etherpad shows) have made a difference.
>> There just has been more issues as usual recently and they probably
>> couldn't keep track. It's not a fun job to babysit stable branches,
>> belittling the stable branch champions results is not the best way to
>> encourage them to continue in this position. I agree that they could
>> work more with the QA team when they get overwhelmed, and raise more red
>> flags when they just can't keep up.
> I actually don't see it that way. As one of the few people who has been doing
> this stable debug stuff for some time, it's really the same story as always. The
> pain points have just shifted. The difference now being instead of everyone
> panicking around stable release time that things don't work on the stable
> branches, because we've moved to a branchless model for things like tempest,
> certain people are seeing the pain constantly.
>
> It's not about sitting around and babysitting necessarily, but at least to start
> actually watching jobs that run on the stable branch. The periodic jobs don't
> give even close to a complete picture of the state of the world and don't run
> frequently enough to catch everything. Part of the issue here is because I work
> on tempest, grenade, and devstack I see these failures every time they happen
> because it'll inevitably block development on one of those projects since the
> stable jobs are gating.
>
> I don't mean to belittle anyone's efforts here, I personally know that I wouldn't
> want or be able to do any of the traditional stable-maint backport work, and I
> know it takes time to come up to speed on this work. But, it doesn't change the
> position we're in right now.
>
>> I also disagree with the proposed solution. We announced a support
>> timeframe for Icehouse, our downstream users made plans around it, so we
>> should stick to it as much as we can. If we dropped stable branch
>> support every time a patch can't be landed there, there would just not
>> be any stable branch.
> It's not just this latest issue which has caused me to raise this. (we have a
> fix plan in progress, although EOL would make that moot) It's the same story
> almost every other week at this point. The longer window was always just an
> experiment and I was of the understanding if we deemed it untenable from a
> maintenance POV that we wouldn't do it. I strongly feel that we need to just say
> this isn't working right now and EOL especially before we enter a period where
> we're maintaining 3 stable branches at once.
>
> -Matt Treinish
Matt, I have hesitated to weigh in here but though I agree with much of
this, I also think stable branches are more important than you seem to.
Nomex suit on...
We should consider the possibility that branchless tempest may also be
something where the true cost was not appreciated. When branchless
tempest implied we needed to keep xml tests around in tempest, we threw
them out anyway, which was reasonable. I would rather give up branchless
tempest than the ability for real distributors/deployers/operators to
collaborate on stable branches. If everything were pinned on stable
branches, and without branchless tempest, then it would make things more
tractable for both those interested in keeping stable working and those
just interested in trunk. I believe this would also be closer to what
many real deployers actually do.
That would still leave grenade as an issue because (I think) we try to
run management code and multiple releases of OpenStack on the same node.
I presume no real deployer does that and that the discussion about venvs
will address that issue.
Following this thread I can't help thinking that folks want to help keep
stable working but it is just very complicated the way things are now,
and the consequences of making a mistake very high.
-David
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150210/171f10e3/attachment.html>
More information about the OpenStack-dev
mailing list