[openstack-dev] [Cinder] Rolling upgrades

Michał Dulko michal.dulko at intel.com
Tue Dec 15 12:09:10 UTC 2015


At the meeting recently it was mentioned that our rolling upgrades
efforts are pursuing an "elusive unicorn" that makes development a lot
more complicated and restricted. I want to try to clarify this a bit,
explain the strategy more and give an update on the status of the whole

So first of all - it's definitely achievable, as Nova supports rolling
upgrades from Kilo. It makes developer's life harder, but the feature is
useful, e.g. CERN was able to upgrade their compute nodes after control
plane services in their enormously big environment in their Juno->Kilo
upgrade [1].

Rolling upgrades are all about interoperability of services running in
different versions. We want to give operators ability to upgrade service
instances one-by-one, starting form c-api, through c-sch to c-vol and
c-bak. Moreover we want to be sure that old and new version of a single
service can coexist. This means we need to be backward compatible with
at least one previous release. There are 3 planes on which
incompatibilities may happen:
* API of RPC methods
* Structure of composite data sent over RPC
* DB schemas

API of RPC methods
Here we're strictly following Nova's solution described in [2]. We need
to support RPC version pinning, so each RPC API addition needs to be
versioned and we need to be able to downgrade the request to required
version in rpcapi.py modules. On the other side manager.py should be
able to process the request even when it doesn't receive newly added
parameter. There are already some examples of this approach in tree
([3], [4]). Until the upgrade is completed the RPC API version is pinned
so everything should be compatible with older release. Once only new
services are running the pin may be released.

Structure of composite data sent over RPC
Again RPC version pinning is utilized with addition of versioned
objects. Before sending the object we will translate it to the lower
version - according to the version pin. This will make sure that object
can be understand by older services. Note that newer services can
translate the object back to the new version when receiving an old one.

DB schemas
This is a hard one. We've needed to adapt approach described in [5] to
our needs as we're calling the DB from all of our services and not only
from nova-conductor as Nova does. This means that in case of a
non-backward compatible migration we need to stretch the process through
3 releases. Good news is that we haven't needed such migration since
Juno (in M we have a few candidates… :(). Process for Cinder is
described in [6]. In general we want to ban migrations that are
non-backward compatible or exclusively lock the table for an extended
period of time ([7] is a good source of truth for MySQL) and allow them
only if they follow 3-relase period of migration (so that N+2 release
has no notion of a column or table so we can drop it).

Right now we're finishing the oslo.versionedobjects adoption -
outstanding patches can be found in [8] (there are still a few to come -
look at table at the bottom of [9]). In case of DB schemas upgrades
we've merged the spec and a test that's banning contracting migrations
is in review [10]. In case of RPC API compatibility I'm actively
reviewing the patches to make sure every change there is done properly.

Apart from that in the backlog is documenting all this in devref and
implementing partial upgrade Grenade tests that will gate on version

I hope this clarifies a bit how we're progressing to be able to upgrade
Cinder with minimal or no downtime.

[2] http://www.danplanet.com/blog/2015/10/05/upgrades-in-nova-rpc-apis/
[9] https://etherpad.openstack.org/p/cinder-rolling-upgrade

More information about the OpenStack-dev mailing list