[openstack-dev] Online Migrations.

Mike Bayer mbayer at redhat.com
Mon Jun 15 19:23:39 UTC 2015



On 6/15/15 2:21 PM, Dan Smith wrote:
>>>   
> Tying this to the releases is less desirable from my perspective. It
> means that landing a thing requires more than six months of developer
> and reviewer context. We have that right now, and we get along, but it's
> much harder to plan, execute, and cleanup those sorts of longer-lived
> changes. It also means that CDers have to wait for the contract to be
> landed well after they should have been able to clean up their database,
> and may imply that people _have_ to do a contract at some point,
> depending on how it's exposed.
>
> The goal for this was to separate the three phases. Tying one of them to
> the releases kinda hampers the utility of it to some degree, IMHO.
> Making it declarative (even when part of what is declared are the
> condition(s) upon which a particular contraction can proceed) is much
> more desirable to me.
all of these things are true.

but i don't see how this part of things is going to be solved unless you 
otherwise do something like #1, but maybe not as complicated as that.

Here's the deal.  If I write a program, that says this:


class MyThing(Model):
     __tablename__ = 'thing'
     x = Column()
     y = Column()

then I say:

print session.query(MyThing)


it's going to run "SELECT x, y FROM thing"

if you want MyThing to have "y" there, but the program runs in some kind 
of mode that doesnt include "y" anymore, you can do something like this:


class MyThing(Model):
     __tablename__ = 'thing'
     x = Column()

     if we_have_column('thing', 'y'):
         y = Column()

note that the above is totally pseudocode.    If you want it to be like 
"y = RemovedColumn()", there is probably a way to make it work that way 
also, e.g. that there's this declared "y = something()" in your model, 
but the MyThing model does not actually get a "y" in it, and even that 
"y" is written to some other collection like 
MyThing.columns_we_have_removed (again, also pseudocode).

Alternatively, you can have MyThing with .x and .y and then try to mess 
around with your Query() objects so that they skip "y" when this 
condition occurs, which at the basic level looks like:

session.query(MyThing).options(defer('y')).

With this approach, you'd probably want to use a new API I've added in 
1.0 that allows for on-query-construction events which can add these 
deferral rules.   Hacking this into model_query() is going to be more 
difficult / hardcoded and also isn't going to accommodate things like 
lazy loads, joins, eager loads, etc.   In any case, to do this correctly 
for intercepted queries is doable but might be difficult and error prone 
in some cases, as it has to search for all entities in the query, 
aliased, joined, subqueried, etc. that might be referring to 
"thing.y".   Also something has to be worked out for the persistence 
side; it needs to be excluded from INSERT statements and even UPDATE 
statements if some logic is setting a value for it.     Or you could 
build up some SQL execution events using the SQLAlchemy event API to 
just scrub these columns out when the SQL is emitted, but then we have 
to parse and rewrite SQL.

But either way, you can have all of that.   But what is not clear here 
is, when is that decision made, that we no longer have "y" ?

Is it made:

1. at runtime?  e.g. your nova service is running, it's doing "SELECT x, 
y FROM thing", then some magic thing happens somewhere and the app 
suddenly sees, hey "y" is gone!  change all queries to "SELECT x FROM 
thing".     What would this magic thing be?   Are you going to run a 
reflection of the table schema on every query (you definitely aren't).   
So I don't know that this is possible.

2. at application start time?   e.g. nova service starts up, something 
happens before "MyThing" is first declared where MyThing knows that "y" 
is no longer there for this run (or something that will impact all the 
queries and persistence operations, less desirable).

#2 is much more possible.  But still, how does it run?   How do we know 
that "y" is there on one run, and is not there on another?   do we:

2a.  When the app starts up, we run reflection queries against the DB 
(e.g. what autogenerate  / OSM does, looking in schema catalogs).    
This is doable, but can get expensive on startup if we really have lots 
of columns/tables to worry about; it also means that either the changes 
to the queries here happen totally at query time (intricate, 
difficult-ish), as for the change to happen at model definition time 
(simple, easy) means the app needs to be connected to the database 
before it imports the models, and this is the complete opposite of how 
Nova's api.py is constructed right now.   Plus the feature needs to 
accommodate for Cells, where there's a totally different database 
happening (maybe this has to be query time for that reason alone).

2b. In a config file somewhere?   Some kind of directive that says, "hey 
we have now dropped "thing.y".  What would that look like?

2c. Based on some kind of version number in the database?   Not too much 
different from #2a.






>
> That said, I still think we should get the original thing merged. Even
> if we did contractions purely with the manual migrations for the
> foreseeable future, that'd be something we could deal with.
>
> --Dan
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list