[openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic
Jay Pipes
jaypipes at gmail.com
Thu May 21 16:28:47 UTC 2015
Hi Mike,
I had some initial concerns around the online db schema migration work
as you do below. However, once I realized one big thing, those concerns
were allayed.
Here's the big thing that you're missing: no data migrations are allowed
any more in DB migration scripts.
Yes, that sounds exceedingly weird, I know, but hear me out in the
comments inline... :) I've trimmed your original email for brevity.
On 05/20/2015 11:57 PM, Mike Bayer wrote:
> But to illustrate, here's the kind of place that goes wrong in such a
> way that is at best pretty annoying and at worst a serious and
> error-prone development and testing headache. Let's start with a
> hypothetical schema that has some design issues. Two tables "widget"
> and "widget_status", where "widget_status" has some kind of information
> about a "widget", and it also stores a timestamp, unfortunately as a
> string:
>
> CREATE TABLE widget (
> id INTEGER PRIMARY KEY,
> name VARCHAR(30) NOT NULL
> )
>
> CREATE TABLE widget_status (
> widget_id INTEGER PRIMARY KEY REFERENCES widget(id),
> status_flag INTEGER NOT NULL,
> modified_date VARCHAR(30)
> )
>
> Let's say that two entirely different changes by two different
> developers want to accomplish two things: 1. convert "modified_date"
> into a new column "modified_timestamp" which is a DATETIME type, not a
> string and 2. merge these two tables into one, as the need to JOIN all
> the time is non-performant and unnecessary. That is, we'll end up with
> this:
>
> CREATE TABLE widget (
> id INTEGER PRIMARY KEY,
> name VARCHAR(30) NOT NULL
> status_flag INTEGER NOT NULL,
> modified_timestamp DATETIME
> )
>
> Right off, let's keep in mind that when online schema migrations runs,
> the fact that there's a #1 and a #2 migration to the schema is lost.
And this is where you're missing the important piece :) While it *is*
true that the expand phase would combine the *DDL* schema migrations
into just a final table structure as shown above, it is actually *not*
true that the knowledge is lost of how the model schemas changed.
The key to all of this is that each change to the DB schema would be in
a patch that contained corresponding changes to the nova.objects.*
classes that are used to represent the data. And it is *these* object
classes that have object versioning and model changes tracked in them.
So, for instance, let's say the widget class was represented in a
nova.objects.widget.Widget model in the following code, before the patch
that added the #1 schema migration:
class Widget(base.NovaPersistentObject):
VERSION = '1.0'
fields = {
'id': fields.IntegerField(),
'name': fields.StringField(),
'modified_date': fields.StringField(),
'status_code': fields.IntegerField()
}
The above object would have two important methods on it; one that would
know how to save the data in the object's fields to the backend database
and another method (obj_make_compatible()) that would know how to return
the object to callers that expected the object's schema to look a
certain way at a certain version of the object.
Before the patch that landed DB schema migration #1, the Widget object's
obj_make_compatible() method would be blank -- since there's no version
transformations that need to occur. The Widget.save() method would take
the values in the fields and write those to the widgets and
widget_status table in a single transaction.
Now, along comes the patch that adds DB schema migration #1 that changes
the data type of the underlying widget_status.modified_date from a
VARCHAR to a DATETIME, and simultaneously changes the name of the DB
column from modified_date to modified_timestamp. In this same patch,
there would be a corresponding change to the nova.objects.Widget class
that would make the class definition look something like this:
class Widget(base.NovaPersistentObject):
VERSION = '1.1'
fields = {
'id': fields.IntegerField(),
'name': fields.StringField(),
'modified_timestamp': fields.DatetimeField(),
'status_code': fields.IntegerField()
}
@staticmethod
def _from_db_object(context, widget, db_obj):
# db_obj is what comes back from the DB API call
# widget_get(). The SQL for this call will look like this:
#
# SELECT id, modified_date, modified_timestamp, status_code
# FROM widgets JOIN widget_status ON id = widget_id
# WHERE widgets.id = ?
#
# The returned data will look like this:
# {'id': 1, 'name': 'my widget',
# 'modified_date': None or string,
# 'modified_timestamp': None or datetime,
# 'status_code': 42
for f in ('id', 'status_code', 'name'):
widget[f] = db_obj[f]
ts = db_obj['modified_timestamp']
if ts is None:
ts = datetime.datetime.strftime(db_obj['modified_date'])
widget['modified_timestamp'] = ts
def obj_make_compatible(self, primitive, target_version):
super(Widget, self).obj_make_compatible(...)
if target_version < (1, 1):
# Older compute nodes don't know about the
# modified_timestamp column, only the
# modified_date column, so "translate" the
# returned result.
dstring = str(primitive['modified_timestamp'])
primitive['modified_date'] = dstring
del primitive['modified_timestamp']
@remotable
def save(self):
# Above, the _from_db_object() method ensured that the
# translation occurred between the still-existing-but-old
# modified_date DB field and the new modified_timestamp
# field. This method will call widget_save() in the DB API
# which will hit both the widgets and widget_status table:
#
# BEGIN;
# UPDATE widgets SET modified_timestamp = ?, name = ?
# WHERE id = ?;
# UPDATE widget_status SET status_code = ?
# WHERE widget_id = ?;
# COMMIT;
Over time, the modified_timestamp DB field will be populated and there
will be no need to run a single data migrationin any SQLAlchemy-migrate
or Alembic script...
OK, now here comes another developer and she adds DB schema migration #2
which adds the status_code field to the widgets table. Note that the
widget_status table will still be around! The contract phase, which
would remove the widget_status table isn't going to be run until the DBA
has determined that all data migrations that happen in the Nova objects
have occurred...
The Widget class would look like this in the patch that adds the #2 DB
schema migration:
class Widget(base.NovaPersistentObject):
VERSION = '1.1'
fields = {
'id': fields.IntegerField(),
'name': fields.StringField(),
'modified_timestamp': fields.DatetimeField(),
'status_code': fields.IntegerField()
}
@staticmethod
def _from_db_object(context, widget, db_obj):
# db_obj is what comes back from the DB API call
# widget_get(). The SQL for this call will look like this:
#
# SELECT id, modified_date, modified_timestamp,
# w.status_code, ws.status_code as ws_status_code
# FROM widgets w JOIN widget_status ws ON id = widget_id
# WHERE w.id = ?
#
# The returned data will look like this:
# {'id': 1, 'name': 'my widget',
# 'modified_date': '20150601',
# 'modified_timestamp': None or datetime,
# 'status_code': None or integer,
# 'ws_status_code': None or integer
for f in ('id', 'name'):
widget[f] = db_obj[f]
ts = db_obj['modified_timestamp']
if ts is None:
ts = datetime.datetime.strftime(db_obj['modified_date'])
widget['modified_timestamp'] = ts
sc = db_obj['status_code']
if sc is None:
sc = db_obj['ws_status_code']
@remotable
def save(self):
# Above, the _from_db_object() method ensured that the
# translation occurred between the still-existing-but-old
# modified_date DB field and the new modified_timestamp
# field as well as the new widget.status_code DB field.
# This method will call widget_save() in the DB API,
# which will only now hit the widgets table:
#
# BEGIN;
# UPDATE widgets SET modified_timestamp = ?, name = ?
# status_code = ?
# WHERE id = ?;
# COMMIT;
In this way, data migrations occur *over time* in the nova.objects
classes, and the contract phase can be delayed indefinitely until such a
point that the DBA has determined all data has been properly migrated.
Hope this helps explain things a bit more, Mike!
Best,
-jay
More information about the OpenStack-dev
mailing list