[placement] How do we migrate DB manipulating data?
Hi, I'd like to discuss how we can have DB upgrade migration method (with data manipulation) in placement. --- BackGround ========== https://bugs.launchpad.net/nova/+bug/1803925 * In Rocky, to have nested resource provider feature, we expanded the DB to have root provider id column. * The root provider id shouldn't be None and for root providers it should be the same value of its resource provider id. * In Rocky, the code is build in a backward compatible way doing online migration. * For each request of listing/showing resource providers, we look the root provider id and if it is stale and empty, we assume the resource provider is a root and set the same value as resource provider id. * Those providers that are not called in the Rocky cycle will still have an empty value for the root provider id. * In Stein or later, we want a way to be sure that all the root provider id contains some non-None value. * This is because we have a lot of TODOs in code which we want to clean up once we are sure all the root provider ids have non-None value in the DB. * In Stein, we are already ready use alembic to manage DB schema changes in placement. Question ======== How should we copy the resource provider id to root provider id if the root provider id is None? Options ======= 1. Do it in the alembic script in the same way as the schema expansion * This is done in the https://review.openstack.org/#/c/619126/ and brought several concerns. * We don't want the data manipulation migration to be inter-mixed with schema changes. * For cases skipping one release in an FFU fashion, there would be a lot of rows to be changed. 2. Have two stable version branches for alembic to separate schema changes and data manipulation * Neutron has been managing two branches in alembic already. * https://specs.openstack.org/openstack/neutron-specs/specs/liberty/online-sch... * Developers would specify on which branch to have a new version via some a new CLI something like; `placement-db-manage revision -m "description of revision" (--schema-change|--data-manipulate)` * We should be careful for the dependency management across the two branches. * https://alembic.sqlalchemy.org/en/latest/branches.html 3. Bring the online-migration batch CLI command * Follow the traditional way to manage DB data in Nova * `placement-manage db online-data-migration [--max-count]` * I'm looking into this in https://review.openstack.org/#/c/624942/. 4. Any other ideas? --- Since it looks like it's going to have an impact both on operators who upgrade and developers who add new features in placement, it is nice to seek more ideas and to have a consensus before we go further. Thanks, -- Tetsuro Nakamura <nakamura.tetsuro@lab.ntt.co.jp> NTT Network Service Systems Laboratories TEL:0422 59 6914(National)/+81 422 59 6914(International) 3-9-11, Midori-Cho Musashino-Shi, Tokyo 180-8585 Japan
On Wed, 19 Dec 2018, TETSURO NAKAMURA wrote:
1. Do it in the alembic script in the same way as the schema expansion 2. Have two stable version branches for alembic to separate schema changes and data manipulation 3. Bring the online-migration batch CLI command 4. Any other ideas?
Since it looks like it's going to have an impact both on operators who upgrade and developers who add new features in placement, it is nice to seek more ideas and to have a consensus before we go further.
As I said on the review of https://review.openstack.org/#/c/624942/ I'm somewhat reluctant to add yet another command, simply because each moving part is another thing to know about and to manage, and sometimes, yet another command to run. In that sense of the options presented, I prefer option 1, but Dan makes good points on https://review.openstack.org/#/c/619126/ . The idealist in me thinks that we ought to be able to: * ensure any migrations we do (schema or data) that need to be done from the command line are quick and light * enable anything that is neither quick nor light on an as needed basis in the running code Where this latter option with root id went wrong is that the implementation was incomplete: It didn't cover all the cases where a root id would be involved, or perhaps we added cases later that didn't account for that possibility. In either case we didn't make good enough tests. However: the command that's been implemented in 624942 is pretty clean and if it is what people think is the best solution, it's likely the shortest path for us to have something that works and keep the most people happy is using it. Unless we hear from other people we should probably just go with that, but we should wait until January to decide since not many people are around now. For the record, my real preference is that we have neither of 'db sync' nor 'db online-data-migration' commands and simply do those things when the server process starts itself (but before listening on a socket). There's a WIP for that in https://review.openstack.org/#/c/619050/ . I've not pursued that too aggressively because people don't seem that into it, but, to me, having placement self contained with as few additional processes is the way to have the most flexibility. However, my position doesn't take into account the many diverse ways that people deploy their clouds. Which is why we need input from the larger community on these kinds of decisions. Thanks. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 12/21/2018 05:18 AM, Chris Dent wrote:
On Wed, 19 Dec 2018, TETSURO NAKAMURA wrote:
1. Do it in the alembic script in the same way as the schema expansion 2. Have two stable version branches for alembic to separate schema changes and data manipulation 3. Bring the online-migration batch CLI command 4. Any other ideas?
Since it looks like it's going to have an impact both on operators who upgrade and developers who add new features in placement, it is nice to seek more ideas and to have a consensus before we go further.
As I said on the review of https://review.openstack.org/#/c/624942/ I'm somewhat reluctant to add yet another command, simply because each moving part is another thing to know about and to manage, and sometimes, yet another command to run. In that sense of the options presented, I prefer option 1, but Dan makes good points on https://review.openstack.org/#/c/619126/ . The idealist in me thinks that we ought to be able to:
* ensure any migrations we do (schema or data) that need to be done from the command line are quick and light * enable anything that is neither quick nor light on an as needed basis in the running code
Where this latter option with root id went wrong is that the implementation was incomplete: It didn't cover all the cases where a root id would be involved, or perhaps we added cases later that didn't account for that possibility. In either case we didn't make good enough tests.
However: the command that's been implemented in 624942 is pretty clean and if it is what people think is the best solution, it's likely the shortest path for us to have something that works and keep the most people happy is using it.
Unless we hear from other people we should probably just go with that, but we should wait until January to decide since not many people are around now.
For the record, my real preference is that we have neither of 'db sync' nor 'db online-data-migration' commands and simply do those things when the server process starts itself (but before listening on a socket).
+1000 This is how Swift works [1] and is the least operational burden of all approaches I've seen.
There's a WIP for that in https://review.openstack.org/#/c/619050/ . I've not pursued that too aggressively because people don't seem that into it, but, to me, having placement self contained with as few additional processes is the way to have the most flexibility. However, my position doesn't take into account the many diverse ways that people deploy their clouds.
I haven't looked at the WIP patch for this yet. I will try to do that soon.
Which is why we need input from the larger community on these kinds of decisions.
Agreed. Best, -jay [1] examples: https://github.com/openstack/swift/blob/a7aa2329584f1d02f7d1fa205d56aeadffdc...
On 12/19/2018 02:22 AM, TETSURO NAKAMURA wrote:
Hi,
I'd like to discuss how we can have DB upgrade migration method (with data manipulation) in placement.
---
BackGround ==========
https://bugs.launchpad.net/nova/+bug/1803925
* In Rocky, to have nested resource provider feature, we expanded the DB to have root provider id column.
Technically, we did this in Queens. The commit was in Sept 2016, more than two years ago: https://github.com/openstack/nova/commit/b10f11d7e8e1afb7a12a470f92c42bf3c23...
* The root provider id shouldn't be None and for root providers it should be the same value of its resource provider id.
* In Rocky, the code is build in a backward compatible way doing online migration. * For each request of listing/showing resource providers, we look the root provider id and if it is stale and empty, we assume the resource provider is a root and set the same value as resource provider id. * Those providers that are not called in the Rocky cycle will still have an empty value for the root provider id.
* In Stein or later, we want a way to be sure that all the root provider id contains some non-None value.
To be more succinct, we want to be able to modify the root_provider_id column's nullability constraint to be NOT NULL.
* This is because we have a lot of TODOs in code which we want to clean up once we are sure all the root provider ids have non-None value in the DB.
++
* In Stein, we are already ready use alembic to manage DB schema changes in placement.
Question ========
How should we copy the resource provider id to root provider id if the root provider id is None?
Options =======
1. Do it in the alembic script in the same way as the schema expansion * This is done in the https://review.openstack.org/#/c/619126/ and brought several concerns. * We don't want the data manipulation migration to be inter-mixed with schema changes. * For cases skipping one release in an FFU fashion, there would be a lot of rows to be changed.
2. Have two stable version branches for alembic to separate schema changes and data manipulation * Neutron has been managing two branches in alembic already. * https://specs.openstack.org/openstack/neutron-specs/specs/liberty/online-sch...
* Developers would specify on which branch to have a new version via some a new CLI something like; `placement-db-manage revision -m "description of revision" (--schema-change|--data-manipulate)` * We should be careful for the dependency management across the two branches. * https://alembic.sqlalchemy.org/en/latest/branches.html
3. Bring the online-migration batch CLI command * Follow the traditional way to manage DB data in Nova * `placement-manage db online-data-migration [--max-count]` * I'm looking into this in https://review.openstack.org/#/c/624942/.
4. Any other ideas?
I think #2 is the best of the options, but not to separate data from schema migrations. Rather, just separate expand from contract migrations, as the original Neutron work did. Some data migrations can be done in an expand migration. Others must be done in a contract migration because the underlying software that uses the data may make assumptions that break after that data migration is run. The data migration of setting NULL root_provider_id to the value of the provider's id is a good example of a data migration that can run in an expand migration. The underlying code that uses root_provider_id handles cases where root_provider_id is NULL (it defaults those NULL values to the provider's id value). The only thing that would need to go into a contract migration is the schema migration to change the resource_providers.root_provider_id column constraint from NULL to NOT NULL, since that cannot run against the database unless all records have had their NULL root_provider_id column set with a value. The primary problem I have with the way Nova's online database migration functionality works is that the original idea of separating expanding and contracting schema migrations became muddied with the concept of whether the migration might cause some performance degradation of the database. Because a certain OpenStack deployer (RAX) was having issues running migrations against their database infrastructure, we started equating *all* schema and data migrations with the foregone conclusion of "this will take down the control plane for a long time!". Which simply isn't the case. We've added a large amount of code in the nova-manage online_data_migration command to deal with data migrations in a batched manner when running a simple single SQL statement against a database with even a huge number of rows would have taken milliseconds and caused zero disruption. We've foregone the simple solution and always go with the complex solution, and I think we're worse off for that in Nova. Ultimately, I think that both schema and data migrations should be triggered upon startup of a placement-api worker before it starts accepting connections (as noted in cdent's response here). However, that preference does not preclude the #2 solution above from being used (the placement-api service could simply automatically call the expand phase migrations on startup and optionally call the contract phase migrations on startup or some other trigger). So, in short, my vote is #2 but just call the two phases expand and contract, and don't differentiate between "schema migration" and "data migration". Best, -jay
participants (3)
-
Chris Dent
-
Jay Pipes
-
TETSURO NAKAMURA