[openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic
Mike Bayer
mbayer at redhat.com
Fri May 22 21:27:27 UTC 2015
OK, have just gotten off a chat with the folks at summit.
I am glad that I've managed to get my concerns about this approach out
there. For people reading my notes here, I've gotten the answer to my
question about how database access code is written for a system that is
moving from some particular schema structure A to a new one B.
Essentially, supposing we've released "L", and we are now in development
for "M". Over the course of M, we are adding new objects, e.g. tables,
columns; let's call these M1, M2, M3. These objects are meant to
replace older objects in L, say L1, L2, L3.
As M is being developed, at all times the model and database access
layer must consider both of L1, L2, L3, and M1, M2, M3, at the same
time. Meaning, the notion of schema migrations as something you commit
and at which point you can cleanly just change your model is gone. The
model needs to be able to load data from the L1/L2/L3 objects and ensure
that it gets copied to M1, M2, M3, either as the APIs are accessed under
normal use, or via a "background process" that will move data over from
L to M. It is only when the database data is fully moved to M, but
also when *all database-connected applications* are moved up as well,
that the "contract" phase can be run. The "contract" phase will then
drop every object in the database that is not in the currently running
model, including any additional objects that were added by the operator.
Now, the approach of having a model that can bridge the gap between two
schemas, to delay the full migration of changes across, is in fact very
common in the real world of database applications. This is a common
technique that is often necessary.
What is dramatically different in Nova's case is that what is normally
just a particularly tedious tool one can choose to use in specific
situations, that of the model that must bridge two different schema
designs and slowly migrate data, now becomes an absolute hard
requirement in all cases. It is no longer considered to be tenable
for developers to decide on a case-by-case basis which kinds of
migrations are trivial and can safely be run during an "expand" type of
phase, vs. those that are data- and lock- intensive if done in bulk and
therefore should be carefully rolled out over time at low scale.
Let me be clear that one of the big things I want to work on is cleaning
up the model and database access code I see in Nova and many other
Openstack applications. Right now it's complicated, slow, and is
riddled with evidence that people didn't always have a firm grasp of the
APIs when they wrote it. But what we are talking about is creating a
hard link between the complexity of the model/DB access code and the
ability to make necessary changes and improvements to the schema. It
means that every schema change now inflicts verbosity and complexity
directly into the model and database access logic, not in a
self-contained, write-once-and-forget-it database migration script
elsewhere; data migrations and business model access code are to be
literally merged together most likely into the same function in many
cases. This is definitely going to make my job of cleaning up,
simplifying, and vastly improving the performance of this logic that
much more difficult. This is the squeeze point within the whole
approach and it is also the one which the Nova team could offer the
least specifics on. While simple things like column transitions
shouldn't be too terrible, more significant changes like table moves or
restructurings will be very difficult; and as always, while this might
be fine for Nova, it definitely is not appropriate for less mature
Openstack projects just starting out with new schema designs that will
have a bigger need for periodic refactorings.
The rationale for this hard-edged decision is that all-at-once data
migrations are slow and place an enormous load on the database, and
therefore must be banned in all cases, no matter how trivial. An
anecdotal reference to some obviously serious outage that occurred
during a Nova migration was cited as evidence.
I'm generally not in favor of this approach to a problem. The driving
philosophy of SQLAlchemy and related tools are one of developer
empowerment, not of shuttling away database details behind
one-size-fits-all abstractions that keep developers as far from pointy
and sharp edges as possible; because the edges aren't as sharp as you
remember and a good developer is more deft with tools than you think.
This philosophy is one that I developed over many years working at
companies and watching how various forms of technical anxiety led to all
kinds of obtuse, awkward, and sometimes outright byzantine ways of
operating, all because something a long time ago failed to work as
expected, and it was therefore banned forever - it was usually my job to
extricate teams from these ways of thinking and re-acquaint them with
more flexible and fluent approaches, while at the same time assuaging
their anxiety that we can in fact use our brains to solve problems
correctly as they come up rather than relying on iron bound constraints
that are extremely difficult to modify from a technical perspective.
My proposal to Nova is not that they shouldn't go with this approach,
but only that they proceed with a version of the idea that has an escape
hatch, and at the same time that we make clear to other projects that
this approach is a very specific road to travel and it should not be
assumed to be appropriate for everyone. If Nova goes full on with
online schema migrations, it means there will no longer be any fixed
schema migration files, and no way that even the most trivial data
migration can be implemented without going through the new system of
building out a model and database access layer that talks to both
logical schemas and has to migrate its own data over time. If OTOH they
implement the exact same workflow, such that the migrations are still
generated into files that represent discrete and fixed states of a
schema, they will be able to maintain that approach to a varying degree,
as they are ultimately exercising the new workflow on top of a
traditional system which can still allow for tuning and version control
of schema changes as well as inline data migrations where
appropriate. As a bonus, the system works in a fixed way and won't
delete the objects planted by the operator; it also allows for a
traditional dependency model that will ensure that certain moves always
happen before others, such as ensuring a "contract" against the previous
version is completed before the next version's "expand" proceeds, thus
allowing the database to remain in a clean and defined state. If I
understood correctly, the current plan is that "contract" is an optional
thing that perhaps some operators might never do at all; they'd just
have a database which has old tables and columns from many versions ago
still lying around.
The one objection raised to my alternative proposal is based on the
notion that a certain kind of database "move" might apply in one way to
a particular target database and in a different way to another. In
the general case, this notion doesn't hold a lot of validity, because
the system is emitting Alembic directives in any case which themselves
are database agnostic; I only propose that we render the directives into
a fixed file first. The specific concept that was raised however
regards the notion of a schema operation that in one case wants to be
done in the "expand" phase and in another wants to be done in the
"migrate" phase. Asking for an example, the issue of certain indexes
that behave differently on different MySQL versions; an index addition
that would be a performance blocker during the "migrate" phase on an
older MySQL version should be blocked from the "expand" phase but might
be safer to run within "expand" for later versions of MySQL.
But again, this is not a very difficult issue to overcome. The current
online schema migration code already has within it a ruleset that can
accommodate such a rule. We simply move that rule to be within the
migration directive itself. So that in the expand phase, instead of
"op.create_index(indexname)", we have some kind of qualifier such as
"op.create_index(indexname, create_rule=requires_high_rowcount)", or
similar. Again, this is not manually coded in a migration, it is
rendered out by the autogenerate facilities which would be utilized by
the online schema engine that has already been built. The original
online schema blueprint referred to the advantage of working with
"declarative" structures vs. "imperative" structures, and I certainly
agree; that's why Alembic's directives are themselves declarative and
why the new rules as embedded will be very high level and declarative
themselves. I doubt very many of these new directives will be needed
and they will be simple to implement in any case.
Alembic also supports alternative migration flows for tables such as
"copy and move", which can be evaluated as options in some cases. In any
case, a system where we can manually establish particular migrations to
work in certain ways is more flexible than one where we have to ensure
that a ruleset knows ahead of time how to detect and adapt to certain
tables and conditions on the fly with no pre-defined direction. Without
any place to establish migration behaviors declaratively other than the
model itself means that I can imagine that we ultimately would have to
start adding "hints" to our models, like
"use_migration_style_X_on_mysql" to mapped classes, so that the online
schema system has clues as to what we want to happen in certain cases.
That version of the system would also be tasked with making guesses in
some cases; after all, a declaration in the model itself doesn't
actually know what its being migrated *from*, as online schema changes
start with a schema that is essentially in an undefined state. It would
be better if migration directives still had a dedicated place of their
own to be explicitly laid out, version-controlled, built against a
specific and known previous state, tunable and configurable as needed,
without mixing them up within the object model's declarations.
Having the expand/contract workflow available in such a way that is
compatible with traditional migration files means that this becomes a
feature that Alembic can continue to add support for, and could even
become a feature within Alembic itself. Right now, the "autogenerate"
feature has a pretty straight job of gathering a list of changes and
spitting out a single migration file. It would be a great idea to open
up this API such that different kinds of workflows can be plugged in,
such that rulesets can interpret the autogenerate change stream into
different revision stream structures. We'd get all the capabilities of
the expand/contract workflow without being rigidly welded to it, and as
an Alembic feature it would mean the production of new kinds of
workflows would be available through an upstream declarative system that
would have the benefit of real-world use by other teams outside of
Openstack.
The impact on other projects, not just other Openstack projects but also
Alembic itself, is really why I'm motivated to comment on this system.
It's not that it's so important to me if Nova has a certain process or
not (though I do want to clean up their database access code). It's
more that Nova is always looked upon as the driver for how all other
Openstack applications do things; what they do is what we will all be
doing soon enough. Just look at the subject of this email thread; it's
not about Nova at all, it's about the Heat project, which is eagerly
looking to copy Nova's approach and wondering if they should just do
that instead of migrating to Alembic traditionally. This is why I
really want to get my reservations out there. While I do want all
projects to be on similar database approaches that ultimately derive
from the oslo.* namespace, I'd hope that this one can be opened up a bit
before it is taken on by everyone else. The Nova team seemed to hear
me on this and they'd like to encourage other projects to wait on moving
to this approach until it can be proven. But they also agreed that
yeah, everyone likes to copy Nova a lot. The proof they say, will be
if this approach fails completely and they decide it isn't working. I
don't think that will actually happen. Patterns like these just as
often drag development down in a more slow and subtle way and over time
contribute towards that calcified "don't change it!" culture that takes
months or years to develop. For reference, google "Frog in the
Water". It's a known thing. :)
Allowing certain patterns while always providing for a flexible "escape
hatch" to work at different levels simultaneously, combined with a
strong emphasis on explicitness, has always been the driving philosophy
of SQLAlchemy. That's why there's a Core and an ORM which are separate
but highly interactive together. It's an approach that works and I'd
like to continue to encourage Openstack projects to subscribe to this
philosophy.
More information about the OpenStack-dev
mailing list