[openstack-dev] [nova] reason for python-novaclient revert

Sean Dague sean at dague.net
Tue Oct 29 10:36:06 UTC 2013


Andrew Laski	correctly called us out for not really proving enough 
information n the python-novaclient revert yesterday - 
https://review.openstack.org/#/c/54108/. Appologies there. At the time 
we were dealing with a gate that grenade was failing every change (for 
the prior 6 hours), we were all on our first cup of coffee, and while we 
got to resolution, we did so with an entirely unuseful commit message to 
explain it.

Here's what happened. python-novaclient landed a change that changed the 
user interface. This change meant that devstack exercises failed on 
validating the details on getting aggregates.

However, upgrade testing is hard, and we had a loophole, that led us to 
a wedge in the gate.

For the grenade jobs we prep 2 versions of the OpenStack codebase, 
grizzly and master (yes, still grizzly and master, we're working on 
that). The grizzly tree is grizzly devstack, which means it's grizzly on 
all the core servers, but master on all the clients. However, the 
grizzly tree doesn't get "zuulified", which was the crux of the issue.

By zuulified I mean think about the zuul queue. How do we actually test 
a change 15 deep in the gate? We aren't testing just that change, but 
all the gerrit proposed changes above it. That means that zuul needs to 
go through and update relevant git trees beyond master, but to the 
proposed change sets for all the jobs in front of it. This is accross 
projects, and should be across branches.

But we'd not gotten the system to do this correctly on the "old" side 
yet. Which means that python-novaclient landed a breaking change, but 
the "old" side built a grizzly cloud with only master, not master + 
gerrit. It passed the verification of the "old" cloud, then moved to the 
new cloud, then ran a different set of tests to verify the new cloud, 
which passed.

However, by threading the needle in this way, it meant no one else could 
ever pass grenade again. The quick fix was the python-novaclient revert. 
The real fix is probably this - https://review.openstack.org/#/c/53940/ 
which we were actually working on last week, to both update the set of 
trees we are using, and update the zuul refs on the "old" side of the 
equation. Once that lands I'll attempt to revert the revert, and ensure 
that it actually gets caught in the system. Then we can work on updating 
tests so it can get through. But right now it's a perfect test case to 
proove that we did this right, so leaving it in the reverted state is 
critical.

This also highlights one of the reasons I've been hard on folks recently 
about some alternative upgrade or mixed version testing models, and 
doing it outside of grenade. Everything is simple when you talk about a 
single change. But when you are 15 or 20 deep in zuul gate, and have to 
handle 3 proposed stable nova changes, 5 proposed master nova changes, a 
keystone stable, a keystone master, and a few cinder master changes in 
front of you to build the environments you need to test in the gate.... 
this gets complicated fast. Basically you aren't allowed to use git 
inside your upgrade tool for this reason, because your tool has no idea 
what it's supposed to actually test, only ZUUL knows. And, as you can 
see, we've yet to get this whole thing mapped out the first time. :)

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list