[openstack-dev] [neutron][infra][release] neutron-lbaas CI job issues
Robert Collins
robertc at robertcollins.net
Tue May 12 22:12:42 UTC 2015
Hi,
I got pinged this morning about a problem where neutron jobs were
wedged, because a bad commit got into master.
We've figured it out and https://review.openstack.org/182455 will
hopefully land soon to make the neutron-lbaas job nonvoting so we can
land https://review.openstack.org/#/c/182433/ which should fix the
brokenness in Neutron master. However the neutron-lbaas job can't be
turned back on again until its actually testing the commit-under-test.
But, we got into this place from a couple of intersecting issues,
which I think bear wider education, so here's a bit of a deepdive.
tl;dr:
- NEVER clone from git.openstack.org during CI. Use the local cache.
- ALWAYS use the ZUUL_REF
(http://ci.openstack.org/zuul/launchers.html#common-parameters) when
using git sources, so that you test what zuul expects to be tested.
- NEVER use setuptools hooks that refer to the local source tree.
They break peoples head and place unreasonable constraints on the
project source.
details:
Firstly, the root cause: the Neutron-LBaaS job was not testing the
commit that was landing in Neutron, so is was voting on what was in
master. This means that neutron can completely break Neutron-LBaaS and
we find out afterwards... when the next commit to Neutron cannot land.
It also means that we can't fix it, because fixed code can't get
through to Neutron-LBaaS to unbreak it. Adding the ZUUL_REF to the
requirement line that drags in neutron should fix this.
Now the secondary root cause: Neutron was violating one of our basic
design principles for working with setuptools. That principle has two
parts: 'setup.py must be able to be run with all-of-and-only
setup_requires present' and 'the only setup_requires we use is 'pbr'.'
We have these two parts because easy_install is much less reliable in
our environment than pip, and easy_install lies hidden behind the
surface of setup.py: anytime an operation like 'egg_info' or 'install'
or 'bdist_wheel' is invoked, if any setup_requires is missing,
easy_install will try to get it... poorly and unreliably. Infra has
the scars :). So - in our CI environment we pre-install our one and
only setup_requires: pbr, and then we can be sure that setup.py
egg_info can be run reliably. But - setup-hooks requires importing the
hook object, and a hook object in the neutron namespace used in the
neutron project means that we will try to import 'neutron' *before
installing the neutron requirements*. This naturally fails - and we
get the failures seen in the jobs, where neutron.hooks wasn't
importable. It was this that caused the py34 patch
https://review.openstack.org/#/c/181277/ to fail for Neutron-LBaaS
once it was in master: six was not already installed when 'setup.py
egg_info' was run, and without it 'neutron' was not importable.
*this is hard for developers to remember and to reason about*. So: I
suggest a simple rule of thumb: Other than pbr itself, no project
should use setup hooks that refer to the project namespace itself.
Whatever hooks are needed, put them in pbr (because only pbr is
allowed to be in our setup_requires). Often hooks won't be needed and
instead something declarative can be used. In Neutron's case,
https://review.openstack.org/#/c/182433/ demonstrates how to avoid
using hooks for this particular use case. (This will get squished by
update.py during CI runs, but as it is for win32 compat, thats ok
during CI runs). Soon (probably shortly after vancouver) we'll have a
release of pbr that can express this in requirements.txt, and it will
be completely standard).
While examining the failure another issue came to light: the
Neutron-LBaaS job clones neutron from git.openstack.org, which is
fragile: at scale we suffer a non-trivial number of network glitches
during the day, and thus we cache all our git repositories locally on
our test nodes. pip can be told via the '--src' parameter where to
check projects out - if there is an existing checkout it will use it.
So something like pip --src=/opt/stack/new/ should DTRT there (inside
CI - you won't want that for local development with tox). Its possible
to set that via environment variables or pip.conf - I suspect in CI
pip.conf will be the way to go.
Lastly I note that the openstack/neutron-lbaas/requirements.txt file
doesn't list neutron but the project depends on it. This is in part
because we don't publish neutron to PyPI, and I'm going to bring that
up with the release team in Vancouver, but its going to bite someone
somewhere. I suggest listing neutron in requirements.txt: that *is*
the dependency. Doing so will cause pip to rightly refuse to install
neutron-lbaas without neutron. Folk can install neutron first, or use
a local requirements file to pick a particular revision to install
from git, and pip will be fine with that (thats what CI does right
now:)).
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud
More information about the OpenStack-dev
mailing list