[openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic
Mike Bayer
mbayer at redhat.com
Thu May 21 18:50:01 UTC 2015
On 5/21/15 12:28 PM, Jay Pipes wrote:
> Hi Mike,
>
> I had some initial concerns around the online db schema migration work
> as you do below. However, once I realized one big thing, those
> concerns were allayed.
>
> Here's the big thing that you're missing: no data migrations are
> allowed any more in DB migration scripts.
>
> Yes, that sounds exceedingly weird, I know, but hear me out in the
> comments inline... :) I've trimmed your original email for brevity.
>
>
> And this is where you're missing the important piece :) While it *is*
> true that the expand phase would combine the *DDL* schema migrations
> into just a final table structure as shown above, it is actually *not*
> true that the knowledge is lost of how the model schemas changed.
>
> The key to all of this is that each change to the DB schema would be
> in a patch that contained corresponding changes to the nova.objects.*
> classes that are used to represent the data. And it is *these* object
> classes that have object versioning and model changes tracked in them.
Ah. OK, I didn't even have to read the rest of that to understand
(well, at least to a minimal degree, as I think more about it) what you
mean. You can do data migrations via the API, which is versioned, and
in between expand and contract, both APIs are available. (Goes to
read the next several paragraphs to confirm that's what you mean).
>
> So, for instance, let's say the widget class was represented in a
> nova.objects.widget.Widget model in the following code, before the
> patch that added the #1 schema migration:
> ...
> Over time, the modified_timestamp DB field will be populated and there
> will be no need to run a single data migrationin any
> SQLAlchemy-migrate or Alembic script...
OK, I think my understanding is correct, though I'm not sure what "over
time" means.... is there a "migrate the data via API version 1.0 -> 2.0"
step somewhere? Or is the idea that, "over time" definitely is finished
before "contract" is ever run, right?
>
> OK, now here comes another developer and she adds DB schema migration
> #2 which adds the status_code field to the widgets table. Note that
> the widget_status table will still be around! The contract phase,
> which would remove the widget_status table isn't going to be run until
> the DBA has determined that all data migrations that happen in the
> Nova objects have occurred...
OK, not knowing the specifics, you just said, "all data migrations that
happen in the Nova objects", so OK, there *are* data migrations that
someone has to run, somewhere. They are between API version numbers,
and the headache Nova has is that they have to constantly update these
objects to maintain cross-API-version compatibility whenever the schema
changes, OK.
> The Widget class would look like this in the patch that adds the #2 DB
> schema migration:
>
>
> class Widget(base.NovaPersistentObject):
> VERSION = '1.1'
> fields = {
> 'id': fields.IntegerField(),
> 'name': fields.StringField(),
> 'modified_timestamp': fields.DatetimeField(),
> 'status_code': fields.IntegerField()
> }
>
> @staticmethod
> def _from_db_object(context, widget, db_obj):
> # db_obj is what comes back from the DB API call
> # widget_get(). The SQL for this call will look like this:
> #
> # SELECT id, modified_date, modified_timestamp,
> # w.status_code, ws.status_code as ws_status_code
> # FROM widgets w JOIN widget_status ws ON id = widget_id
> # WHERE w.id = ?
> #
> # The returned data will look like this:
> # {'id': 1, 'name': 'my widget',
> # 'modified_date': '20150601',
> # 'modified_timestamp': None or datetime,
> # 'status_code': None or integer,
> # 'ws_status_code': None or integer
> for f in ('id', 'name'):
> widget[f] = db_obj[f]
> ts = db_obj['modified_timestamp']
> if ts is None:
> ts = datetime.datetime.strftime(db_obj['modified_date'])
> widget['modified_timestamp'] = ts
> sc = db_obj['status_code']
> if sc is None:
> sc = db_obj['ws_status_code']
OK now I...well I thought I really get it, but still seems like
something I'm missing because this seems still very wrong: You not
only can't do data migrations in the traditional way anymore, you have
to write *all of your model code to be compatible with all versions of
the schema at all times*. That is, moving tables, or even columns,
around and such, very difficult (if not impossible?). For example.
Your _from_db_object() above calls into a method that has to emit a JOIN
from "widget" to "widget_status". Meanwhile, the migration we really
want to do is that we "DROP" widget_status in the "contract" phase. My
understanding is, a Nova application that is in the "migrate" stage can
be sent through the "contract" stage with no code changes, correct?
Then how can you possibly write the model code that queries for "widget
JOIN widget_status" when on one side of "contract", "widget_status"
exists, and on the other, "widget_status" does *not* exist? The Nova
objects layer doesn't do the SQL queries, it calls into
sqlalchemy/api.py ultimately.
Put another way, if I want to do a migration that someday, "over time",
eventually, etc., will drop the "widget_status" table and merge it into
the "widget" table, how do I write the "get_widget()" DB API call ?
Same for just dropping a column. If we have to write code that will
return the value of the old table / old column while it exists, how does
that part work? Somewhere, there has to be code that in one case
says: "query(Widget).join(WidgetStatus)" and in other case says
"query(Widget)". You have to maintain *both* SQL queries at the same
time?
Also, I'd note that the "online schema migrations" concept that other
projects are undoubtedly going to look at apparently has an *enormous*
dependency on the fact that a project uses oslo.objects to maintain all
data *completely*. This is not at all mentioned in the blueprint which
IMO is a pretty giant omission.
More information about the OpenStack-dev
mailing list