Open Stack

Sun Jan 19 12:01:58 UTC 2014

On 01/19/2014 05:38 AM, Sean Dague wrote:
> So, we're currently 70 deep in the gate, top of queue went in > 40 hrs
> ago (probably closer to 50 or 60, but we only have enqueue time going
> back to the zuul restart).
>
> I have a couple of ideas about things we should do based on what I've
> seen in the gate during this wedge.
>
> = Remove reverify entirely =

Yes. Screw it. In a deep queue like now, it's more generally harmful 
than good.

> Core reviewers can trigger a requeue with +A state changes. Reverify
> right now is exceptional dangerous in that it lets *any* user put
> something back in the gate, even if it can't pass. There are a ton of
> users that believe they are being helpful in doing so, and making things
> a ton worse. stable/havana changes being a prime instance.
>
> If we were being prolog tricky, I'd actually like to make Jenkins -2
> changes need positive run on it before it could be reenqueued. For
> instance, I saw a swift core developer run "reverify bug 123456789"
> again on a change that couldn't pass. While -2s are mostly races at this
> point, the team of people that are choosing to ignore them are not
> staying up on what's going on in the queue enough to really know whether
> or not trying again is ok.
>
> = Early Fail Detection =
>
> With the tempest run now coming in north of an hour, I think we need to
> bump up the priority of signally up to jenkins that we're a failure the
> first time we see that in the subunit stream. If we fail at 30 minutes,
> waiting for 60 until a reset is just adding far more delay.
>
> I'm not really sure how we get started on this one, but I think we should.

This one I think will be helpful, but it also is the one that includes 
that most deep development. Honestly, the chances of getting it done 
this week are almost none.

That said - I agree we should accelerate working on it. I have access to 
a team of folks in India with both python and java backgrounds - if it 
would be helpful and if we can break out work into, you know, assignable 
chunks, let me know.

> = Pep8 kick out of check =
>
> I think on the Check Queue we should pep8 first, and not run other tests
> until that passes (this reverses a previous opinion I had). We're now
> starving nodepool. Preventing taking 5 nodepool nodes on patches that
> don't pep8 would be handy. When Dan pushes a 15 patch change that fixes
> nova-network, and patch 4 has a pep8 error, we thrash a bunch.

Agree. I think this might be one of those things that goes back and 
forth on being a good or bad idea over time. I think now is a time when 
it's a good idea.

> = More aggressive kick out by zuul =
>
> We have issues where projects have racing unit tests, which they've not
> prioritized fixing. So those create wrecking balls in the gate.
> Previously we've been opposed to kicking those out based on the theory
> the patch ahead could be the problem (which I've actually never seen).
>
> However.... this is actually fixable. We could see if there is anything
> ahead of it in zuul that runs the same tests. If not, then it's not
> possible that something ahead of it could fix it. This is based on the
> same logic zuul uses to build the queue in the first place.
>
> This would shed the wrecking balls earlier.

Interesting. How would zuul be able to investigate that? Do we need 
zuul-subunit-consumption for this one too?

> = Periodic recheck on old changes =
>
> I think Michael Still said he was working on this one. Certain projects,
> like Glance and Keystone, tend to approve things with really stale test
> results (> 1 month old). These fail, and then tumble. They are a be
> source of the wrecking balls.

I believe he's got it working, actually. I think the real trick with 
this - which I whole-heartedly approve of - is not making node 
starvation worse.

> Tests results > 1 week are clearly irrelevant. For something like nova,
>> 3 days can be problematic.
>
> I'm sure there are some other ideas, but I wanted to dump this out while
> it was fresh in my brain.
>
> 	-Sean
>
>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>

Open Stack

[OpenStack-Infra] suggestions for gate optimizations

OpenStack

Community

Documentation

Branding & Legal