[openstack-dev] State of the Gate - Dec 12
sean at dague.net
Thu Dec 12 13:20:09 UTC 2013
Current Gate Length: 12hrs*, 41 deep
(top of gate entered 12hrs ago)
It's been an *exciting* week this week. For people not paying attention
we had 2 external events which made things terrible earlier in the week.
Event 1: sphinx 1.2 complete breakage - MOSTLY RESOLVED
It turns out sphinx 1.2 + distutils (which pbr magic call through) means
total sadness. The fix for this was a requirements pin to sphinx < 1.2,
and until a project has taken that they will fail in the gate.
It also turns out that tox installs pre-released software by default (a
terrible default behavior), so you also need a tox.ini change like this
- https://github.com/openstack/nova/blob/master/tox.ini#L9 otherwise
local users will install things like sphinx 1.2b3. They will also break
in other ways.
Not all projects have merged this. If you are a project that hasn't,
please don't send any other jobs to the gate until you do. A lot of
delay was added to the gate yesterday by Glance patches being pushed to
the gate before their doc jobs were done.
Event 2: apt.puppetlabs.com outage - RESOLVED
We use that apt repository to setup the devstack nodes in nodepool with
puppet. We were triggering an issue with grenade where it's apt-get
calls were failing, because it does apt-get update once to make sure
life is good. This only triggered in grenade (noth other devstack runs)
because we do set -o errexit aggressively.
A fix in grenade to ignore these errors was merged yesterday afternoon
(the purple line - http://status.openstack.org/elastic-recheck/ you can
see where it showed up).
Top Gate Bugs
We normally do this as a list, and you can see the whole list here -
http://status.openstack.org/elastic-recheck/ (now sorted by number of
FAILURES in the last 2 weeks)
That being said, our bigs race bug is currently this one bug -
https://bugs.launchpad.net/tempest/+bug/1253896 - and if you want to
merge patches, fixing that one bug will be huge.
Basically, you can't ssh into guests that get created. That's sort of a
fundamental property of a cloud. It shows up more frequently on neutron
jobs, possibly due to actually testing the metadata server path. There
have been many attempts on retry logic on this, we actually retry for
196 seconds to get in and only fail once we can't get in, so waiting
isn't helping. It doesn't seem like the env is under that much load.
Until we resolve this, life will not be good in landing patches.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 482 bytes
Desc: OpenPGP digital signature
More information about the OpenStack-dev