[OpenStack-Infra] suggestions for gate optimizations

Sean Dague sean at dague.net
Mon Jan 20 11:57:12 UTC 2014


On 01/19/2014 11:38 PM, Joe Gordon wrote:
> 
> 
> 
> On Sun, Jan 19, 2014 at 7:01 AM, Monty Taylor <mordred at inaugust.com
> <mailto:mordred at inaugust.com>> wrote:
> 
>     On 01/19/2014 05:38 AM, Sean Dague wrote:
> 
>         So, we're currently 70 deep in the gate, top of queue went in >
>         40 hrs
>         ago (probably closer to 50 or 60, but we only have enqueue time
>         going
>         back to the zuul restart).
> 
>         I have a couple of ideas about things we should do based on what
>         I've
>         seen in the gate during this wedge.
> 
>         = Remove reverify entirely =
> 
> 
>     Yes. Screw it. In a deep queue like now, it's more generally harmful
>     than good.
> 
> 
> I agree with this one, but we should also try to educate the devs,
> because in the case you brought up below it was a core dev who didn't
> examine why his patch failed and if he couldn't do reverify bug, he
> could just do +A.

Sure. My experience at this point is it will only be of mixed success.
There are tons of devs that just say F it and push stuff ahead because
they are busy.

The only way you could really fix something like that would be if +A was
a points system like TCP slow start. Which is this totally other system
you'd have to build. I think fun as a bar conversation, completely
useless in practice.

>         Core reviewers can trigger a requeue with +A state changes. Reverify
>         right now is exceptional dangerous in that it lets *any* user put
>         something back in the gate, even if it can't pass. There are a
>         ton of
>         users that believe they are being helpful in doing so, and
>         making things
>         a ton worse. stable/havana changes being a prime instance.
> 
>         If we were being prolog tricky, I'd actually like to make Jenkins -2
>         changes need positive run on it before it could be reenqueued. For
>         instance, I saw a swift core developer run "reverify bug 123456789"
>         again on a change that couldn't pass. While -2s are mostly races
>         at this
>         point, the team of people that are choosing to ignore them are not
>         staying up on what's going on in the queue enough to really know
>         whether
>         or not trying again is ok.
> 
>         = Early Fail Detection =
> 
>         With the tempest run now coming in north of an hour, I think we
>         need to
>         bump up the priority of signally up to jenkins that we're a
>         failure the
>         first time we see that in the subunit stream. If we fail at 30
>         minutes,
>         waiting for 60 until a reset is just adding far more delay.
> 
>         I'm not really sure how we get started on this one, but I think
>         we should.
> 
> 
>     This one I think will be helpful, but it also is the one that
>     includes that most deep development. Honestly, the chances of
>     getting it done this week are almost none.
> 
>     That said - I agree we should accelerate working on it. I have
>     access to a team of folks in India with both python and java
>     backgrounds - if it would be helpful and if we can break out work
>     into, you know, assignable chunks, let me know.
> 
> 
>         = Pep8 kick out of check =
> 
>         I think on the Check Queue we should pep8 first, and not run
>         other tests
>         until that passes (this reverses a previous opinion I had).
>         We're now
>         starving nodepool. Preventing taking 5 nodepool nodes on patches
>         that
>         don't pep8 would be handy. When Dan pushes a 15 patch change
>         that fixes
>         nova-network, and patch 4 has a pep8 error, we thrash a bunch.
> 
> 
>     Agree. I think this might be one of those things that goes back and
>     forth on being a good or bad idea over time. I think now is a time
>     when it's a good idea.
> 
> 
> 
> What about adding a pre-gate queue that makes sure pep8 and unit tests
> pass before adding a job to the gate (of course this would mean we would
> have to re-run pep8 and unit tests in the gate). Hopefully this would
> reduce the amount of gate thrashing incurred by a gate patch that fails
> one of these jobs.

So this was a check only statement. This is mostly just about saving
nodes in nodepool. Gate would remain the same.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140120/86bd4001/attachment.pgp>


More information about the OpenStack-Infra mailing list