[placement] How do we migrate DB manipulating data?

Jay Pipes jaypipes at gmail.com
Fri Dec 21 14:21:02 UTC 2018


On 12/19/2018 02:22 AM, TETSURO NAKAMURA wrote:
> Hi,
> 
> I'd like to discuss how we can have DB upgrade migration method (with 
> data manipulation) in placement.
> 
> ---
> 
> BackGround
> ==========
> 
> https://bugs.launchpad.net/nova/+bug/1803925
> 
> * In Rocky, to have nested resource provider feature, we expanded the DB 
> to have root provider id column.

Technically, we did this in Queens. The commit was in Sept 2016, more 
than two years ago:

https://github.com/openstack/nova/commit/b10f11d7e8e1afb7a12a470f92c42bf3c23eca95

>      * The root provider id shouldn't be None and for root providers it 
> should be the same value of its resource provider id.
> 
> * In Rocky, the code is build in a backward compatible way doing online 
> migration.
>      * For each request of listing/showing resource providers, we look 
> the root provider id and if it is stale and empty, we assume the 
> resource provider is a root and set the same value as resource provider id.
>      * Those providers that are not called in the Rocky cycle will still 
> have an empty value for the root provider id.
> 
> * In Stein or later, we want a way to be sure that all the root provider 
> id contains some non-None value.

To be more succinct, we want to be able to modify the root_provider_id 
column's nullability constraint to be NOT NULL.

>      * This is because we have a lot of TODOs in code which we want to 
> clean up once we are sure all the root provider ids have non-None value 
> in the DB.

++

> * In Stein, we are already ready use alembic to manage DB schema changes 
> in placement.
> 
> Question
> ========
> 
> How should we copy the resource provider id to root provider id if the 
> root provider id is None?
> 
> Options
> =======
> 
> 1. Do it in the alembic script in the same way as the schema expansion
>      * This is done in the https://review.openstack.org/#/c/619126/ and 
> brought several concerns.
>          * We don't want the data manipulation migration to be 
> inter-mixed with schema changes.
>          * For cases skipping one release in an FFU fashion, there would 
> be a lot of rows to be changed.
> 
> 2. Have two stable version branches for alembic to separate schema 
> changes and data manipulation
>      * Neutron has been managing two branches in alembic already.
>          * 
> https://specs.openstack.org/openstack/neutron-specs/specs/liberty/online-schema-migrations.html 
> 
>      * Developers would specify on which branch to have a new version 
> via some a new CLI something like; `placement-db-manage revision -m 
> "description of revision" (--schema-change|--data-manipulate)`
>      * We should be careful for the dependency management across the two 
> branches.
>      * https://alembic.sqlalchemy.org/en/latest/branches.html
> 
> 3. Bring the online-migration batch CLI command
>      * Follow the traditional way to manage DB data in Nova
>          * `placement-manage db online-data-migration [--max-count]`
>          * I'm looking into this in 
> https://review.openstack.org/#/c/624942/.
> 
> 4. Any other ideas?

I think #2 is the best of the options, but not to separate data from 
schema migrations.

Rather, just separate expand from contract migrations, as the original 
Neutron work did. Some data migrations can be done in an expand 
migration. Others must be done in a contract migration because the 
underlying software that uses the data may make assumptions that break 
after that data migration is run.

The data migration of setting NULL root_provider_id to the value of the 
provider's id is a good example of a data migration that can run in an 
expand migration. The underlying code that uses root_provider_id handles 
cases where root_provider_id is NULL (it defaults those NULL values to 
the provider's id value).

The only thing that would need to go into a contract migration is the 
schema migration to change the resource_providers.root_provider_id 
column constraint from NULL to NOT NULL, since that cannot run against 
the database unless all records have had their NULL root_provider_id 
column set with a value.

The primary problem I have with the way Nova's online database migration 
functionality works is that the original idea of separating expanding 
and contracting schema migrations became muddied with the concept of 
whether the migration might cause some performance degradation of the 
database.

Because a certain OpenStack deployer (RAX) was having issues running 
migrations against their database infrastructure, we started equating 
*all* schema and data migrations with the foregone conclusion of "this 
will take down the control plane for a long time!". Which simply isn't 
the case.

We've added a large amount of code in the nova-manage 
online_data_migration command to deal with data migrations in a batched 
manner when running a simple single SQL statement against a database 
with even a huge number of rows would have taken milliseconds and caused 
zero disruption. We've foregone the simple solution and always go with 
the complex solution, and I think we're worse off for that in Nova.

Ultimately, I think that both schema and data migrations should be 
triggered upon startup of a placement-api worker before it starts 
accepting connections (as noted in cdent's response here).

However, that preference does not preclude the #2 solution above from 
being used (the placement-api service could simply automatically call 
the expand phase migrations on startup and optionally call the contract 
phase migrations on startup or some other trigger).

So, in short, my vote is #2 but just call the two phases expand and 
contract, and don't differentiate between "schema migration" and "data 
migration".

Best,
-jay



More information about the openstack-discuss mailing list