Open Stack

Fri May 22 21:27:27 UTC 2015

OK, have just gotten off a chat with the folks at summit.

I am glad that I've managed to get my concerns about this approach out 
there.   For people reading my notes here, I've gotten the answer to my 
question about how database access code is written for a system that is 
moving from some particular schema structure A to a new one B.

Essentially, supposing we've released "L", and we are now in development 
for "M".  Over the course of M, we are adding new objects, e.g. tables, 
columns; let's call these M1, M2, M3.   These objects are meant to 
replace older objects in L, say L1, L2, L3.

As M is being developed, at all times the model and database access 
layer must consider both of L1, L2, L3, and M1, M2, M3, at the same 
time.  Meaning, the notion of schema migrations as something you commit 
and at which point you can cleanly just change your model is gone.  The 
model needs to be able to load data from the L1/L2/L3 objects and ensure 
that it gets copied to M1, M2, M3, either as the APIs are accessed under 
normal use, or via a "background process" that will move data over from 
L to M.   It is only when the database data is fully moved to M, but 
also when *all database-connected applications* are moved up as well, 
that the "contract" phase can be run.   The "contract" phase will then 
drop every object in the database that is not in the currently running 
model, including any additional objects that were added by the operator.

Now, the approach of having a model that can bridge the gap between two 
schemas, to delay the full migration of changes across, is in fact very 
common in the real world of database applications.   This is a common 
technique that is often necessary.

What is dramatically different in Nova's case is that what is normally 
just a particularly tedious tool one can choose to use in specific 
situations, that of the model that must bridge two different schema 
designs and slowly migrate data, now becomes an absolute hard 
requirement in all cases.     It is no longer considered to be tenable 
for developers to decide on a case-by-case basis which kinds of 
migrations are trivial and can safely be run during an "expand" type of 
phase, vs. those that are data- and lock- intensive if done in bulk and 
therefore should be carefully rolled out over time at low scale.

Let me be clear that one of the big things I want to work on is cleaning 
up the model and database access code I see in Nova and many other 
Openstack applications.  Right now it's complicated, slow, and is 
riddled with evidence that people didn't always have a firm grasp of the 
APIs when they wrote it.   But what we are talking about is creating a 
hard link between the complexity of the model/DB access code and the 
ability to make necessary changes and improvements to the schema.   It 
means that every schema change now inflicts verbosity and complexity 
directly into the model and database access logic, not in a 
self-contained, write-once-and-forget-it database migration script 
elsewhere; data migrations and business model access code are to be 
literally merged together most likely into the same function in many 
cases.     This is definitely going to make my job of cleaning up, 
simplifying, and vastly improving the performance of this logic that 
much more difficult.   This is the squeeze point within the whole 
approach and it is also the one which the Nova team could offer the 
least specifics on.  While simple things like column transitions 
shouldn't be too terrible, more significant changes like table moves or 
restructurings will be very difficult; and as always, while this might 
be fine for Nova, it definitely is not appropriate for less mature 
Openstack projects just starting out with new schema designs that will 
have a bigger need for periodic refactorings.

The rationale for this hard-edged decision is that all-at-once data 
migrations are slow and place an enormous load on the database, and 
therefore must be banned in all cases, no matter how trivial.   An 
anecdotal reference to some obviously serious outage that occurred 
during a Nova migration was cited as evidence.

I'm generally not in favor of this approach to a problem.   The driving 
philosophy of SQLAlchemy and related tools are one of developer 
empowerment, not of shuttling away database details behind 
one-size-fits-all abstractions that keep developers as far from pointy 
and sharp edges as possible; because the edges aren't as sharp as you 
remember and a good developer is more deft with tools than you think.   
This philosophy is one that I developed over many years working at 
companies and watching how various forms of technical anxiety led to all 
kinds of obtuse, awkward, and sometimes outright byzantine ways of 
operating, all because something a long time ago failed to work as 
expected, and it was therefore banned forever - it was usually my job to 
extricate teams from these ways of thinking and re-acquaint them with 
more flexible and fluent approaches, while at the same time assuaging 
their anxiety that we can in fact use our brains to solve problems 
correctly as they come up rather than relying on iron bound constraints 
that are extremely difficult to modify from a technical perspective.

My proposal to Nova is not that they shouldn't go with this approach, 
but only that they proceed with a version of the idea that has an escape 
hatch, and at the same time that we make clear to other projects that 
this approach is a very specific road to travel and it should not be 
assumed to be appropriate for everyone.   If Nova goes full on with 
online schema migrations, it means there will no longer be any fixed 
schema migration files, and no way that even the most trivial data 
migration can be implemented without going through the new system of 
building out a model and database access layer that talks to both 
logical schemas and has to migrate its own data over time.  If OTOH they 
implement the exact same workflow, such that the migrations are still 
generated into files that represent discrete and fixed states of a 
schema, they will be able to maintain that approach to a varying degree, 
as they are ultimately exercising the new workflow on top of a 
traditional system which can still allow for tuning and version control 
of schema changes as well as inline data migrations where 
appropriate.    As a bonus, the system works in a fixed way and won't 
delete the objects planted by the operator; it also allows for a 
traditional dependency model that will ensure that certain moves always 
happen before others, such as ensuring a "contract" against the previous 
version is completed before the next version's "expand" proceeds, thus 
allowing the database to remain in a clean and defined state.   If I 
understood correctly, the current plan is that "contract" is an optional 
thing that perhaps some operators might never do at all; they'd just 
have a database which has old tables and columns from many versions ago 
still lying around.

The one objection raised to my alternative proposal is based on the 
notion that a certain kind of database "move" might apply in one way to 
a particular target database and in a different way to another.    In 
the general case, this notion doesn't hold a lot of validity, because 
the system is emitting Alembic directives in any case which themselves 
are database agnostic; I only propose that we render the directives into 
a fixed file first.   The specific concept that was raised however 
regards the notion of a schema operation that in one case wants to be 
done in the "expand" phase and in another wants to be done in the 
"migrate" phase.   Asking for an example, the issue of certain indexes 
that behave differently on different MySQL versions; an index addition 
that would be a performance blocker during the "migrate" phase on an 
older MySQL version should be blocked from the "expand" phase but might 
be safer to run within "expand" for later versions of MySQL.

But again, this is not a very difficult issue to overcome.  The current 
online schema migration code already has within it a ruleset that can 
accommodate such a rule.  We simply move that rule to be within the 
migration directive itself.    So that in the expand phase, instead of 
"op.create_index(indexname)", we have some kind of qualifier such as 
"op.create_index(indexname, create_rule=requires_high_rowcount)", or 
similar.   Again, this is not manually coded in a migration, it is 
rendered out by the autogenerate facilities which would be utilized by 
the online schema engine that has already been built.     The original 
online schema blueprint referred to the advantage of working with 
"declarative" structures vs. "imperative" structures, and I certainly 
agree; that's why Alembic's directives are themselves declarative and 
why the new rules as embedded will be very high level and declarative 
themselves.   I doubt very many of these new directives will be needed 
and they will be simple to implement in any case.

Alembic also supports alternative migration flows for tables such as 
"copy and move", which can be evaluated as options in some cases. In any 
case, a system where we can manually establish particular migrations to 
work in certain ways is more flexible than one where we have to ensure 
that a ruleset knows ahead of time how to detect and adapt to certain 
tables and conditions on the fly with no pre-defined direction.  Without 
any place to establish migration behaviors declaratively other than the 
model itself means that I can imagine that we ultimately would have to 
start adding "hints" to our models, like 
"use_migration_style_X_on_mysql" to mapped classes, so that the online 
schema system has clues as to what we want to happen in certain cases.  
That version of the system would also be tasked with making guesses in 
some cases; after all, a declaration in the model itself doesn't 
actually know what its being migrated *from*, as online schema changes 
start with a schema that is essentially in an undefined state.  It would 
be better if migration directives still had a dedicated place of their 
own to be explicitly laid out, version-controlled, built against a 
specific and known previous state, tunable and configurable as needed, 
without mixing them up within the object model's declarations.

Having the expand/contract workflow available in such a way that is 
compatible with traditional migration files means that this becomes a 
feature that Alembic can continue to add support for, and could even 
become a feature within Alembic itself.  Right now, the "autogenerate" 
feature has a pretty straight job of gathering a list of changes and 
spitting out a single migration file.  It would be a great idea to open 
up this API such that different kinds of workflows can be plugged in, 
such that rulesets can interpret the autogenerate change stream into 
different revision stream structures.   We'd get all the capabilities of 
the expand/contract workflow without being rigidly welded to it, and as 
an Alembic feature it would mean the production of new kinds of 
workflows would be available through an upstream declarative system that 
would have the benefit of real-world use by other teams outside of 
Openstack.

The impact on other projects, not just other Openstack projects but also 
Alembic itself, is really why I'm motivated to comment on this system.  
It's not that it's so important to me if Nova has a certain process or 
not (though I do want to clean up their database access code).  It's 
more that Nova is always looked upon as the driver for how all other 
Openstack applications do things; what they do is what we will all be 
doing soon enough.  Just look at the subject of this email thread; it's 
not about Nova at all, it's about the Heat project, which is eagerly 
looking to copy Nova's approach and wondering if they should just do 
that instead of migrating to Alembic traditionally.  This is why I 
really want to get my reservations out there.   While I do want all 
projects to be on similar database approaches that ultimately derive 
from the oslo.* namespace, I'd hope that this one can be opened up a bit 
before it is taken on by everyone else.   The Nova team seemed to hear 
me on this and they'd like to encourage other projects to wait on moving 
to this approach until it can be proven.  But they also agreed that 
yeah, everyone likes to copy Nova a lot.   The proof they say, will be 
if this approach fails completely and they decide it isn't working.   I 
don't think that will actually happen.   Patterns like these just as 
often drag development down in a more slow and subtle way and over time 
contribute towards that calcified "don't change it!" culture that takes 
months or years to develop.  For reference, google "Frog in the 
Water".   It's a known thing.  :)

Allowing certain patterns while always providing for a flexible "escape 
hatch" to work at different levels simultaneously, combined with a 
strong emphasis on explicitness, has always been the driving philosophy 
of SQLAlchemy.  That's why there's a Core and an ORM which are separate 
but highly interactive together.   It's an approach that works and I'd 
like to continue to encourage Openstack projects to subscribe to this 
philosophy.

Open Stack

[openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic

OpenStack

Community

Documentation

Branding & Legal