[openstack-dev] Online Migrations.
Mike Bayer
mbayer at redhat.com
Mon Jun 15 19:23:39 UTC 2015
On 6/15/15 2:21 PM, Dan Smith wrote:
>>>
> Tying this to the releases is less desirable from my perspective. It
> means that landing a thing requires more than six months of developer
> and reviewer context. We have that right now, and we get along, but it's
> much harder to plan, execute, and cleanup those sorts of longer-lived
> changes. It also means that CDers have to wait for the contract to be
> landed well after they should have been able to clean up their database,
> and may imply that people _have_ to do a contract at some point,
> depending on how it's exposed.
>
> The goal for this was to separate the three phases. Tying one of them to
> the releases kinda hampers the utility of it to some degree, IMHO.
> Making it declarative (even when part of what is declared are the
> condition(s) upon which a particular contraction can proceed) is much
> more desirable to me.
all of these things are true.
but i don't see how this part of things is going to be solved unless you
otherwise do something like #1, but maybe not as complicated as that.
Here's the deal. If I write a program, that says this:
class MyThing(Model):
__tablename__ = 'thing'
x = Column()
y = Column()
then I say:
print session.query(MyThing)
it's going to run "SELECT x, y FROM thing"
if you want MyThing to have "y" there, but the program runs in some kind
of mode that doesnt include "y" anymore, you can do something like this:
class MyThing(Model):
__tablename__ = 'thing'
x = Column()
if we_have_column('thing', 'y'):
y = Column()
note that the above is totally pseudocode. If you want it to be like
"y = RemovedColumn()", there is probably a way to make it work that way
also, e.g. that there's this declared "y = something()" in your model,
but the MyThing model does not actually get a "y" in it, and even that
"y" is written to some other collection like
MyThing.columns_we_have_removed (again, also pseudocode).
Alternatively, you can have MyThing with .x and .y and then try to mess
around with your Query() objects so that they skip "y" when this
condition occurs, which at the basic level looks like:
session.query(MyThing).options(defer('y')).
With this approach, you'd probably want to use a new API I've added in
1.0 that allows for on-query-construction events which can add these
deferral rules. Hacking this into model_query() is going to be more
difficult / hardcoded and also isn't going to accommodate things like
lazy loads, joins, eager loads, etc. In any case, to do this correctly
for intercepted queries is doable but might be difficult and error prone
in some cases, as it has to search for all entities in the query,
aliased, joined, subqueried, etc. that might be referring to
"thing.y". Also something has to be worked out for the persistence
side; it needs to be excluded from INSERT statements and even UPDATE
statements if some logic is setting a value for it. Or you could
build up some SQL execution events using the SQLAlchemy event API to
just scrub these columns out when the SQL is emitted, but then we have
to parse and rewrite SQL.
But either way, you can have all of that. But what is not clear here
is, when is that decision made, that we no longer have "y" ?
Is it made:
1. at runtime? e.g. your nova service is running, it's doing "SELECT x,
y FROM thing", then some magic thing happens somewhere and the app
suddenly sees, hey "y" is gone! change all queries to "SELECT x FROM
thing". What would this magic thing be? Are you going to run a
reflection of the table schema on every query (you definitely aren't).
So I don't know that this is possible.
2. at application start time? e.g. nova service starts up, something
happens before "MyThing" is first declared where MyThing knows that "y"
is no longer there for this run (or something that will impact all the
queries and persistence operations, less desirable).
#2 is much more possible. But still, how does it run? How do we know
that "y" is there on one run, and is not there on another? do we:
2a. When the app starts up, we run reflection queries against the DB
(e.g. what autogenerate / OSM does, looking in schema catalogs).
This is doable, but can get expensive on startup if we really have lots
of columns/tables to worry about; it also means that either the changes
to the queries here happen totally at query time (intricate,
difficult-ish), as for the change to happen at model definition time
(simple, easy) means the app needs to be connected to the database
before it imports the models, and this is the complete opposite of how
Nova's api.py is constructed right now. Plus the feature needs to
accommodate for Cells, where there's a totally different database
happening (maybe this has to be query time for that reason alone).
2b. In a config file somewhere? Some kind of directive that says, "hey
we have now dropped "thing.y". What would that look like?
2c. Based on some kind of version number in the database? Not too much
different from #2a.
>
> That said, I still think we should get the original thing merged. Even
> if we did contractions purely with the manual migrations for the
> foreseeable future, that'd be something we could deal with.
>
> --Dan
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list