[placement] How do we migrate DB manipulating data?
Jay Pipes
jaypipes at gmail.com
Fri Dec 21 14:21:02 UTC 2018
On 12/19/2018 02:22 AM, TETSURO NAKAMURA wrote:
> Hi,
>
> I'd like to discuss how we can have DB upgrade migration method (with
> data manipulation) in placement.
>
> ---
>
> BackGround
> ==========
>
> https://bugs.launchpad.net/nova/+bug/1803925
>
> * In Rocky, to have nested resource provider feature, we expanded the DB
> to have root provider id column.
Technically, we did this in Queens. The commit was in Sept 2016, more
than two years ago:
https://github.com/openstack/nova/commit/b10f11d7e8e1afb7a12a470f92c42bf3c23eca95
> * The root provider id shouldn't be None and for root providers it
> should be the same value of its resource provider id.
>
> * In Rocky, the code is build in a backward compatible way doing online
> migration.
> * For each request of listing/showing resource providers, we look
> the root provider id and if it is stale and empty, we assume the
> resource provider is a root and set the same value as resource provider id.
> * Those providers that are not called in the Rocky cycle will still
> have an empty value for the root provider id.
>
> * In Stein or later, we want a way to be sure that all the root provider
> id contains some non-None value.
To be more succinct, we want to be able to modify the root_provider_id
column's nullability constraint to be NOT NULL.
> * This is because we have a lot of TODOs in code which we want to
> clean up once we are sure all the root provider ids have non-None value
> in the DB.
++
> * In Stein, we are already ready use alembic to manage DB schema changes
> in placement.
>
> Question
> ========
>
> How should we copy the resource provider id to root provider id if the
> root provider id is None?
>
> Options
> =======
>
> 1. Do it in the alembic script in the same way as the schema expansion
> * This is done in the https://review.openstack.org/#/c/619126/ and
> brought several concerns.
> * We don't want the data manipulation migration to be
> inter-mixed with schema changes.
> * For cases skipping one release in an FFU fashion, there would
> be a lot of rows to be changed.
>
> 2. Have two stable version branches for alembic to separate schema
> changes and data manipulation
> * Neutron has been managing two branches in alembic already.
> *
> https://specs.openstack.org/openstack/neutron-specs/specs/liberty/online-schema-migrations.html
>
> * Developers would specify on which branch to have a new version
> via some a new CLI something like; `placement-db-manage revision -m
> "description of revision" (--schema-change|--data-manipulate)`
> * We should be careful for the dependency management across the two
> branches.
> * https://alembic.sqlalchemy.org/en/latest/branches.html
>
> 3. Bring the online-migration batch CLI command
> * Follow the traditional way to manage DB data in Nova
> * `placement-manage db online-data-migration [--max-count]`
> * I'm looking into this in
> https://review.openstack.org/#/c/624942/.
>
> 4. Any other ideas?
I think #2 is the best of the options, but not to separate data from
schema migrations.
Rather, just separate expand from contract migrations, as the original
Neutron work did. Some data migrations can be done in an expand
migration. Others must be done in a contract migration because the
underlying software that uses the data may make assumptions that break
after that data migration is run.
The data migration of setting NULL root_provider_id to the value of the
provider's id is a good example of a data migration that can run in an
expand migration. The underlying code that uses root_provider_id handles
cases where root_provider_id is NULL (it defaults those NULL values to
the provider's id value).
The only thing that would need to go into a contract migration is the
schema migration to change the resource_providers.root_provider_id
column constraint from NULL to NOT NULL, since that cannot run against
the database unless all records have had their NULL root_provider_id
column set with a value.
The primary problem I have with the way Nova's online database migration
functionality works is that the original idea of separating expanding
and contracting schema migrations became muddied with the concept of
whether the migration might cause some performance degradation of the
database.
Because a certain OpenStack deployer (RAX) was having issues running
migrations against their database infrastructure, we started equating
*all* schema and data migrations with the foregone conclusion of "this
will take down the control plane for a long time!". Which simply isn't
the case.
We've added a large amount of code in the nova-manage
online_data_migration command to deal with data migrations in a batched
manner when running a simple single SQL statement against a database
with even a huge number of rows would have taken milliseconds and caused
zero disruption. We've foregone the simple solution and always go with
the complex solution, and I think we're worse off for that in Nova.
Ultimately, I think that both schema and data migrations should be
triggered upon startup of a placement-api worker before it starts
accepting connections (as noted in cdent's response here).
However, that preference does not preclude the #2 solution above from
being used (the placement-api service could simply automatically call
the expand phase migrations on startup and optionally call the contract
phase migrations on startup or some other trigger).
So, in short, my vote is #2 but just call the two phases expand and
contract, and don't differentiate between "schema migration" and "data
migration".
Best,
-jay
More information about the openstack-discuss
mailing list