[openstack-dev] [nova] reason for python-novaclient revert
Joshua Harlow
harlowja at yahoo-inc.com
Tue Oct 29 16:19:24 UTC 2013
Zuul just caused my brain to overload ;) thx for the detailed explanation.
Sent from my really tiny device...
> On Oct 29, 2013, at 3:42 AM, "Sean Dague" <sean at dague.net> wrote:
>
> Andrew Laski correctly called us out for not really proving enough information n the python-novaclient revert yesterday - https://review.openstack.org/#/c/54108/. Appologies there. At the time we were dealing with a gate that grenade was failing every change (for the prior 6 hours), we were all on our first cup of coffee, and while we got to resolution, we did so with an entirely unuseful commit message to explain it.
>
> Here's what happened. python-novaclient landed a change that changed the user interface. This change meant that devstack exercises failed on validating the details on getting aggregates.
>
> However, upgrade testing is hard, and we had a loophole, that led us to a wedge in the gate.
>
> For the grenade jobs we prep 2 versions of the OpenStack codebase, grizzly and master (yes, still grizzly and master, we're working on that). The grizzly tree is grizzly devstack, which means it's grizzly on all the core servers, but master on all the clients. However, the grizzly tree doesn't get "zuulified", which was the crux of the issue.
>
> By zuulified I mean think about the zuul queue. How do we actually test a change 15 deep in the gate? We aren't testing just that change, but all the gerrit proposed changes above it. That means that zuul needs to go through and update relevant git trees beyond master, but to the proposed change sets for all the jobs in front of it. This is accross projects, and should be across branches.
>
> But we'd not gotten the system to do this correctly on the "old" side yet. Which means that python-novaclient landed a breaking change, but the "old" side built a grizzly cloud with only master, not master + gerrit. It passed the verification of the "old" cloud, then moved to the new cloud, then ran a different set of tests to verify the new cloud, which passed.
>
> However, by threading the needle in this way, it meant no one else could ever pass grenade again. The quick fix was the python-novaclient revert. The real fix is probably this - https://review.openstack.org/#/c/53940/ which we were actually working on last week, to both update the set of trees we are using, and update the zuul refs on the "old" side of the equation. Once that lands I'll attempt to revert the revert, and ensure that it actually gets caught in the system. Then we can work on updating tests so it can get through. But right now it's a perfect test case to proove that we did this right, so leaving it in the reverted state is critical.
>
> This also highlights one of the reasons I've been hard on folks recently about some alternative upgrade or mixed version testing models, and doing it outside of grenade. Everything is simple when you talk about a single change. But when you are 15 or 20 deep in zuul gate, and have to handle 3 proposed stable nova changes, 5 proposed master nova changes, a keystone stable, a keystone master, and a few cinder master changes in front of you to build the environments you need to test in the gate.... this gets complicated fast. Basically you aren't allowed to use git inside your upgrade tool for this reason, because your tool has no idea what it's supposed to actually test, only ZUUL knows. And, as you can see, we've yet to get this whole thing mapped out the first time. :)
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list