[openstack-dev] State of the Gate - Dec 12

Matt Riedemann mriedem at linux.vnet.ibm.com
Thu Dec 12 18:19:13 UTC 2013



On 12/12/2013 7:20 AM, Sean Dague wrote:
> Current Gate Length: 12hrs*, 41 deep
>
> (top of gate entered 12hrs ago)
>
> It's been an *exciting* week this week. For people not paying attention
> we had 2 external events which made things terrible earlier in the week.
>
> ==========================
> Event 1: sphinx 1.2 complete breakage - MOSTLY RESOLVED
> ==========================
>
> It turns out sphinx 1.2 + distutils (which pbr magic call through) means
> total sadness. The fix for this was a requirements pin to sphinx < 1.2,
> and until a project has taken that they will fail in the gate.
>
> It also turns out that tox installs pre-released software by default (a
> terrible default behavior), so you also need a tox.ini change like this
> - https://github.com/openstack/nova/blob/master/tox.ini#L9 otherwise
> local users will install things like sphinx 1.2b3. They will also break
> in other ways.
>
> Not all projects have merged this. If you are a project that hasn't,
> please don't send any other jobs to the gate until you do. A lot of
> delay was added to the gate yesterday by Glance patches being pushed to
> the gate before their doc jobs were done.
>
> ==========================
> Event 2: apt.puppetlabs.com outage - RESOLVED
> ==========================
>
> We use that apt repository to setup the devstack nodes in nodepool with
> puppet. We were triggering an issue with grenade where it's apt-get
> calls were failing, because it does apt-get update once to make sure
> life is good. This only triggered in grenade (noth other devstack runs)
> because we do set -o errexit aggressively.
>
> A fix in grenade to ignore these errors was merged yesterday afternoon
> (the purple line - http://status.openstack.org/elastic-recheck/ you can
> see where it showed up).
>
> ==========================
> Top Gate Bugs
> ==========================
>
> We normally do this as a list, and you can see the whole list here -
> http://status.openstack.org/elastic-recheck/ (now sorted by number of
> FAILURES in the last 2 weeks)
>
> That being said, our bigs race bug is currently this one bug -
> https://bugs.launchpad.net/tempest/+bug/1253896 - and if you want to
> merge patches, fixing that one bug will be huge.
>
> Basically, you can't ssh into guests that get created. That's sort of a
> fundamental property of a cloud. It shows up more frequently on neutron
> jobs, possibly due to actually testing the metadata server path. There
> have been many attempts on retry logic on this, we actually retry for
> 196 seconds to get in and only fail once we can't get in, so waiting
> isn't helping. It doesn't seem like the env is under that much load.
>
> Until we resolve this, life will not be good in landing patches.
>
> 	-Sean
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

There have been a few threads [1][2] on gate failures and the process 
around what happens when we go about identifying, tracking and fixing them.

I couldn't find anything outside of the mailing list to keep a record of 
this so started a page here [3].

Feel free to contribute so we can point people to how they can easily 
help in working these faster.

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2013-November/020280.html
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2013-November/019931.html
[3] https://wiki.openstack.org/wiki/ElasticRecheck

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list