[openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

Sean Dague sean at dague.net
Thu Aug 25 21:14:13 UTC 2016

On 08/25/2016 01:13 PM, Steve Martinelli wrote:
> The keystone team is pursuing a trigger-based approach to support
> rolling, zero-downtime upgrades. The proposed operator experience is
> documented here:
>   http://docs.openstack.org/developer/keystone/upgrading.html
> This differs from Nova and Neutron's approaches to solve for rolling
> upgrades (which use oslo.versionedobjects), however Keystone is one of
> the few services that doesn't need to manage communication between
> multiple releases of multiple service components talking over the
> message bus (which is the original use case for oslo.versionedobjects,
> and for which it is aptly suited). Keystone simply scales horizontally
> and every node talks directly to the database.
> Database triggers are obviously a new challenge for developers to write,
> honestly challenging to debug (being side effects), and are made even
> more difficult by having to hand write triggers for MySQL, PostgreSQL,
> and SQLite independently (SQLAlchemy offers no assistance in this case),
> as seen in this patch:
>   https://review.openstack.org/#/c/355618/
> However, implementing an application-layer solution with
> oslo.versionedobjects is not an easy task either; refer to Neutron's
> implementation:
> https://review.openstack.org/#/q/topic:bp/adopt-oslo-versioned-objects-for-db
> Our primary concern at this point are how to effectively test the
> triggers we write against our supported database systems, and their
> various deployment variations. We might be able to easily drop SQLite
> support (as it's only supported for our own test suite), but should we
> expect variation in support and/or actual behavior of triggers across
> the MySQLs, MariaDBs, Perconas, etc, of the world that would make it
> necessary to test each of them independently? If you have operational
> experience working with triggers at scale: are there landmines that we
> need to be aware of? What is it going to take for us to say we support
> *zero* dowtime upgrades with confidence?

I would really hold off doing anything triggers related until there was 
sufficient testing for that, especially with potentially dirty data.

Triggers also really bring in a whole new DSL that people need to learn 
and understand, not just across this boundary, but in the future 
debugging issues. And it means that any errors happening here are now in 
a place outside of normal logging / recovery mechanisms.

There is a lot of value that in these hard problem spaces like zero down 
uptime we keep to common patterns between projects because there are 
limited folks with the domain knowledge, and splitting that even further 
makes it hard to make this more universal among projects.


Sean Dague

More information about the OpenStack-dev mailing list