[openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects
Monty Taylor
mordred at inaugust.com
Wed Aug 31 22:18:03 UTC 2016
On 08/25/2016 04:14 PM, Sean Dague wrote:
> On 08/25/2016 01:13 PM, Steve Martinelli wrote:
>> The keystone team is pursuing a trigger-based approach to support
>> rolling, zero-downtime upgrades. The proposed operator experience is
>> documented here:
>>
>> http://docs.openstack.org/developer/keystone/upgrading.html
>>
>> This differs from Nova and Neutron's approaches to solve for rolling
>> upgrades (which use oslo.versionedobjects), however Keystone is one of
>> the few services that doesn't need to manage communication between
>> multiple releases of multiple service components talking over the
>> message bus (which is the original use case for oslo.versionedobjects,
>> and for which it is aptly suited). Keystone simply scales horizontally
>> and every node talks directly to the database.
>>
>> Database triggers are obviously a new challenge for developers to write,
>> honestly challenging to debug (being side effects), and are made even
>> more difficult by having to hand write triggers for MySQL, PostgreSQL,
>> and SQLite independently (SQLAlchemy offers no assistance in this case),
>> as seen in this patch:
>>
>> https://review.openstack.org/#/c/355618/
>>
>> However, implementing an application-layer solution with
>> oslo.versionedobjects is not an easy task either; refer to Neutron's
>> implementation:
>>
>>
>> https://review.openstack.org/#/q/topic:bp/adopt-oslo-versioned-objects-for-db
>>
>>
>> Our primary concern at this point are how to effectively test the
>> triggers we write against our supported database systems, and their
>> various deployment variations. We might be able to easily drop SQLite
>> support (as it's only supported for our own test suite), but should we
>> expect variation in support and/or actual behavior of triggers across
>> the MySQLs, MariaDBs, Perconas, etc, of the world that would make it
>> necessary to test each of them independently? If you have operational
>> experience working with triggers at scale: are there landmines that we
>> need to be aware of? What is it going to take for us to say we support
>> *zero* dowtime upgrades with confidence?
>
> I would really hold off doing anything triggers related until there was
> sufficient testing for that, especially with potentially dirty data.
>
> Triggers also really bring in a whole new DSL that people need to learn
> and understand, not just across this boundary, but in the future
> debugging issues. And it means that any errors happening here are now in
> a place outside of normal logging / recovery mechanisms.
>
> There is a lot of value that in these hard problem spaces like zero down
> uptime we keep to common patterns between projects because there are
> limited folks with the domain knowledge, and splitting that even further
> makes it hard to make this more universal among projects.
I said this the other day in the IRC channel, and I'm going to say it
again here. I'm going to do it as bluntly as I can - please keeping in
mind that I respect all of the humans involved.
I think this is a monstrously terrible idea.
There are MANY reasons for this -but I'm going to limit myself to two.
OpenStack is One Project
========================
Nova and Neutron have an approach for this. It may or may not be ideal -
but it exists right now. While it can be satisfying to discount the
existing approach and write a new one, I do not believe that is in the
best interests of OpenStack as a whole. To diverge in _keystone_ - which
is one of the few projects that must exist in every OpenStack install -
when there exists an approach in the two other most commonly deployed
projects - is such a terrible example of the problems inherent in
Conway's Law that it makes me want to push up a proposal to dissolve all
of the individual project teams and merge all of the repos into a single
repo.
Make the oslo libraries Nova and Neutron are using better. Work with the
Nova and Neutron teams on a consolidated approach. We need to be driving
more towards an OpenStack that behaves as if it wasn't written by
warring factions of developers who barely communicate.
Even if the idea was one I thought was good technically, the above would
still trump that. Work with Nova and Neutron. Be more similar.
PLEASE
BUT - I also don't think it's a good technical solution. That isn't
because triggers don't work in MySQL (they do) - but because we've spent
the last six years explicitly NOT writing raw SQL. We've chosen an
abstraction layer (SQLAlchemy) which does its job well.
IF this were going to be accompanied by a corresponding shift in
approach to not support any backends by MySQL and to start writing our
database interactions directly in SQL in ALL of our projects - I could
MAYBE be convinced. Even then I think doing it in triggers is the wrong
place to put logic.
"Database triggers are obviously a new challenge for developers to
write, honestly challenging to debug (being side effects), and are made
even more difficult by having to hand write triggers for MySQL,
PostgreSQL, and SQLite independently (SQLAlchemy offers no assistance in
this case)"
If you look at:
https://review.openstack.org/#/c/355618/40/keystone/common/sql/expand_repo/versions/002_add_key_hash_and_encrypted_blob_to_credential.py
You will see the three different SQL dialects this. Not only that, but
some of the more esoteric corners of those backends. We can barely get
_indexes_ right in our database layers ... now we think we're going to
get triggers right? Consistently? And handle things like Galera?
The other option is app level, which is what nova and neutron are doing.
It's a good option, because it puts the logic in python, which is a
thing we have 2500 developers fairly well versed in. It's also scalable,
as the things executing whatever the logic is are themselves a scale-out
set of servers. Finally, it's a known and accepted pattern in large
scale MySQL shops ... Roll out a new version of the app code which
understands both the old and the new schema version, then roll out a
no-downtime additive schema change to the database, then have the app
layer process and handle on the fly transformation if needed.
SO ...
Just do what Nova and Neutron are doing - and if it's not good enough,
fix it. Having some projects use triggers and other projects not use
triggers is one of the more epically crazypants things I've heard around
here ... and I lived through the twisted/eventlet argument.
Monty
More information about the OpenStack-dev
mailing list