[openstack-dev] [grenade] module upgrade refactor progress
Sean Dague
sean at dague.net
Mon Apr 13 12:23:35 UTC 2015
While we now have devstack external plugins, grenade (our upgrade
testing framework) was really monolithic. It grew out of a last minute
set of test scripts for Folsom that discovered a number of our database
migrations didn't work with real data in them, and that nova compute had
the annoying habit of killing off VMs when it went down. It has grown in
scope since then, but very organically, and not always clearly at times.
The first step in making this plugable externally is separating out
everything that's really global, vs., what's per service.
That was mostly done last week (with one important part missing,
resource survival testing). The top of the current unmerged stack is
here - https://review.openstack.org/#/c/172648/
== New Structure ==
The crux of this is that all the project specific code now lives in:
grenade/
projects/
10_keystone/.....
20_ceilometer/....
30_swift/....
....
The current (in flux) interface is as follows:
* settings - similar to the devstack plugin, this is a place for initial
setup. So far this has been useful to register things you'd like grenade
to do for you. For instance
> more projects/10_keystone/settings
register_project_for_upgrade keystone
register_db_to_save keystone
Tells us we should register this directory for upgrade (it does a little
magic when it does that). And to save off a database.
* upgrade.sh - the service upgrade script, which is expected to upgrade
& restart the service. It is basically what upgrade-$foo was previously.
In the current patch stream ``upgrade.sh`` is also responsible for doing
a service sanity check once done.
The following functions are provided to help with that:
- ensure_services_started
- ensure_logs_exist
The project also supports a local from-juno/ from-kilo/ within-juno/
directory structure just like we did before. It's just in the service
directory.
* shutdown.sh - the service down script, which is also responsible for
doing a sanity check that the service is actually down.
The following functions are provided to help with that:
- ensure_services_stopped
== Resource Survival ==
This is still in flight, and here is a preview of where this is headed
this week. One of the important things that grenade does is ensure that
resources (like functioning VMs) survive the upgrade unscathed. Mostly
because, once upon a time they did not.
This started as a simple shell script. That broke at some point and no
one noticed (though, we didn't have regressions, so that's at least
something). Last summer we rebuilt that tool as python using the Tempest
clients (javelin2). As that was ending we realized that basically we'd
just recreated ansible in the small (our yaml file and theirs are way
too close to assume otherwise). This also meant we created a new
coupling with Tempest, and a new global coupling of 1 tool that needed
to understand all projects. So we created a new bottleneck.
Grenade is going to get out of the business of dictating a tool. Instead
it's going to dictate an interface:
It will look something like this (exact names in flux, we'll see how the
code evolves).
resources.sh [create|verify_noapi|verify|destroy]
- pre shutdown
- create - make some stuff that we think might not survive upgrades
(i.e. more than just db records)
- verify - make sure that stuff is working
- post shutdown
- verify_noapi - make sure that stuff is working, with checks that
work without any API services up.
- post upgrade
- verify - make sure stuff is still running
- post grenade
- destroy - delete everything so we don't leave crud everywhere
The verify_noapi is currently a hole in our testing, and something Clark
brought up in Darmstadt last summer. It's a good hole to fill.
There will also be some convenience functions provided to store/fetch
persistent data so that grenade can keep track of things like instance
ids / ip addresses and such for resource scripts.
== Upgrade Order ==
This remains one of the last sticking points. Today our upgrade.sh
iterates every project in a specific order and does both the upgrade and
restart at the same time.
The good thing about this is it is simple, and more closely follows a
'rolling-ish' upgrade pattern. The problem is dependency management.
Especially when we talk about libraries from one project injecting into
others that aren't in requirements.txt. Like ceilometermiddleware,
ironicclient.
I'm starting to think we should upgrade / restart as separate steps,
because it will largely get rid of the dependency ordering issues. But
that's up for grabs.
== External API ==
An external plugin definition, similar to the devstack one, is coming.
But not util the rest of this settles out. My hope is it will exist by
Vancouver so we can do an External Plugins in Devstack / Grenade Design
Summit session there. Both as a forum to ask questions about the
existing structure, as well as a discussion of what should move into
external plugins for both projects.
...
This was mostly an FYI for where we stand. Once the rest of the code
lands, and we've crystalized our interfaces, there will be more
information about how projects can plug in here and reuse this framework
for their own upgrade testing.
-Sean
--
Sean Dague
http://dague.net
More information about the OpenStack-dev
mailing list