[nova] [placement] [packaging] placement extraction check in meeting
As discussed in the recent pupdate [1] there will be a meeting this Wednesday at 1700 UTC to discuss the current state of the placement extraction and get some idea on the critical items that need to be addressed to feel comfy. If you're interested in this topic, meet near that time in the #openstack-placement IRC channel and someone will produce links for a hangout, etherpad, whatever is required. Thanks. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001666.h... -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 1/14/2019 12:01 PM, Chris Dent wrote:
As discussed in the recent pupdate [1] there will be a meeting this Wednesday at 1700 UTC to discuss the current state of the placement extraction and get some idea on the critical items that need to be addressed to feel comfy.
If you're interested in this topic, meet near that time in the #openstack-placement IRC channel and someone will produce links for a hangout, etherpad, whatever is required.
Thanks.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001666.h...
Here is my attempt at summarizing the call we had. Notes are in the etherpad [1]. Deployment tools: * Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO. Nested providers / reshaper / VGPU: * VGPU reshaper work for nested resource providers in libvirt and xenapi drivers has stalled and there is still hesitation to move forward with extracting placement before that reshape flow, including in an upgrade, is tested to know that nova does not need any last minute data migrations which require direct access to the placement database. In other words, we have not yet confirmed that the placement reshaper API will be fully sufficient until a real driver is using it. * Matt (me!) has agreed to rebase and address the comments on the libvirt patch [3] to try and push that forward. * We still need someone to write a functional test which creates a server with a flat resource structure, reshapes that to nested, and then creates another server against the same provider tree. Data migration: * The only placement-specific online data migration in nova is "create_incomplete_consumers" and we agreed to copy that into placement and add a placement-status upgrade check for it. The data migration code will build on top of Tetsuro's work [4]. Matt is signed up to work on both of those commands. Miscellaneous: * Placement release notes will start at the current release and reference the nova release notes for anything older (Ocata->Rocky). * Chris is already working on some other things like docs and small governance changes (os-traits), but those are all on hold until the placement code in nova is dropped which is dependent on the deployment tooling support and reshaper changes above. * We agreed to checkpoint again in three weeks so Wednesday February 6 at let's say the same time, 1700 UTC. [1] https://etherpad.openstack.org/p/placement-extract-stein-5 [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001783.h... [3] https://review.openstack.org/#/c/599208/ [4] https://review.openstack.org/#/c/624942/ -- Thanks, Matt
On Wed, Jan 16, 2019 at 8:33 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 1/14/2019 12:01 PM, Chris Dent wrote:
As discussed in the recent pupdate [1] there will be a meeting this Wednesday at 1700 UTC to discuss the current state of the placement extraction and get some idea on the critical items that need to be addressed to feel comfy.
If you're interested in this topic, meet near that time in the #openstack-placement IRC channel and someone will produce links for a hangout, etherpad, whatever is required.
Thanks.
[1]
http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001666.h...
Here is my attempt at summarizing the call we had. Notes are in the etherpad [1].
Thanks for the notes. I apologize but I had other things to do at that time which led me unable to attend this call. Deployment tools:
* Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO.
That sounds a reasonable trade-off IMHO. Nested providers / reshaper / VGPU:
* VGPU reshaper work for nested resource providers in libvirt and xenapi drivers has stalled and there is still hesitation to move forward with extracting placement before that reshape flow, including in an upgrade, is tested to know that nova does not need any last minute data migrations which require direct access to the placement database. In other words, we have not yet confirmed that the placement reshaper API will be fully sufficient until a real driver is using it. * Matt (me!) has agreed to rebase and address the comments on the libvirt patch [3] to try and push that forward.
Thanks Matt for it. The rebase should be quite easy-forward since it's just a matter of exploding methods into smaller ones, but devil can be in details and some UTs could require some extra work. * We still need someone to write a functional test which creates a
server with a flat resource structure, reshapes that to nested, and then creates another server against the same provider tree.
I won't be on perpetual PTO like I was in December/early January and I certainly hope to finish all my internal/customer duties hopefully next week (or the customer wouldn't be happy). So, if you still trust me about havint time for upstream, this is then the priority for me. Data migration:
* The only placement-specific online data migration in nova is "create_incomplete_consumers" and we agreed to copy that into placement and add a placement-status upgrade check for it. The data migration code will build on top of Tetsuro's work [4]. Matt is signed up to work on both of those commands.
Miscellaneous:
* Placement release notes will start at the current release and reference the nova release notes for anything older (Ocata->Rocky). * Chris is already working on some other things like docs and small governance changes (os-traits), but those are all on hold until the placement code in nova is dropped which is dependent on the deployment tooling support and reshaper changes above. * We agreed to checkpoint again in three weeks so Wednesday February 6 at let's say the same time, 1700 UTC.
Worth adding it in agendas, then. -Sylvain [1] https://etherpad.openstack.org/p/placement-extract-stein-5
[2]
http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001783.h... [3] https://review.openstack.org/#/c/599208/ [4] https://review.openstack.org/#/c/624942/
--
Thanks,
Matt
On Wed, Jan 16, 2019 at 8:29 PM, Matt Riedemann <mriedemos@gmail.com> wrote:
Nested providers / reshaper / VGPU:
* VGPU reshaper work for nested resource providers in libvirt and xenapi drivers has stalled and there is still hesitation to move forward with extracting placement before that reshape flow, including in an upgrade, is tested to know that nova does not need any last minute data migrations which require direct access to the placement database. In other words, we have not yet confirmed that the placement reshaper API will be fully sufficient until a real driver is using it. * Matt (me!) has agreed to rebase and address the comments on the libvirt patch [3] to try and push that forward. * We still need someone to write a functional test which creates a server with a flat resource structure, reshapes that to nested, and then creates another server against the same provider tree.
There is a functional test [1] that uses a fake virt driver and simulates rehape. My first attempt was to add an extra instance creation after the end of the reshape. But this test reshapes the provider tree to a way that the resulting tree uses sharing disk provider and doesn't have inventory on the compute node RP any more (cpu and mem moved under NUMA). Unfortunately nova does not yet support scheduling against such tree. Shall I try to add a new functional test with the fake virt driver or try to add a functional test with the libvirt driver top of the VGPU reshaper patch? Cheers, gibi [1] nova.tests.functional.test_servers.ProviderTreeTests#test_reshape
On 1/17/2019 5:16 AM, Balázs Gibizer wrote:
There is a functional test [1] that uses a fake virt driver and simulates rehape. My first attempt was to add an extra instance creation after the end of the reshape. But this test reshapes the provider tree to a way that the resulting tree uses sharing disk provider and doesn't have inventory on the compute node RP any more (cpu and mem moved under NUMA). Unfortunately nova does not yet support scheduling against such tree.
That's probably the one I mentioned on the call then. It uses a fake virt driver but stubs out the update_provider_tree method (from what I remember) and wouldn't be an easy fit for doing what I think we need to do for a new functional test.
Shall I try to add a new functional test with the fake virt driver or try to add a functional test with the libvirt driver top of the VGPU reshaper patch?
I'm personally OK with a fake virt driver (it could even be special purpose like some of our fake virt drivers for testing things like live migration rollback and resize failure/reschedule). Writing anything on top of the libvirt driver is still going to require stubbing out large parts of the libvirt driver code, which essentially makes it a fake driver. I know we have some functional tests for the libvirt driver that stub other stuff (Stephen is familiar with these) so it might be possible, but if I were going to write a new test I'd just use a fake virt driver and have the test be more like our traditional functional tests where we use the API to create a server, then reshape to nested, and then schedule another server to the nested resource class and assert everything is OK, since I think what we're really trying to test here is the API and scheduler interaction more than the virt driver itself. -- Thanks, Matt
On Thu, Jan 17, 2019 at 4:05 PM, Matt Riedemann <mriedemos@gmail.com> wrote:
On 1/17/2019 5:16 AM, Balázs Gibizer wrote:
There is a functional test [1] that uses a fake virt driver and simulates rehape. My first attempt was to add an extra instance creation after the end of the reshape. But this test reshapes the provider tree to a way that the resulting tree uses sharing disk provider and doesn't have inventory on the compute node RP any more (cpu and mem moved under NUMA). Unfortunately nova does not yet support scheduling against such tree.
That's probably the one I mentioned on the call then. It uses a fake virt driver but stubs out the update_provider_tree method (from what I remember) and wouldn't be an easy fit for doing what I think we need to do for a new functional test.
Yes, it is the one.
Shall I try to add a new functional test with the fake virt driver or try to add a functional test with the libvirt driver top of the VGPU reshaper patch?
I'm personally OK with a fake virt driver (it could even be special purpose like some of our fake virt drivers for testing things like live migration rollback and resize failure/reschedule). Writing anything on top of the libvirt driver is still going to require stubbing out large parts of the libvirt driver code, which essentially makes it a fake driver. I know we have some functional tests for the libvirt driver that stub other stuff (Stephen is familiar with these) so it might be possible, but if I were going to write a new test I'd just use a fake virt driver and have the test be more like our traditional functional tests where we use the API to create a server, then reshape to nested, and then schedule another server to the nested resource class and assert everything is OK, since I think what we're really trying to test here is the API and scheduler interaction more than the virt driver itself.
I managed to hack together a functional test[1] that execise the vgpu reshape code in the libvirt driver (thanks to fakelibvirt.py) with instances booted both before and after the reshape. Cheers, gibi [1] https://review.openstack.org/#/c/631559
--
Thanks,
Matt
On Wed, 16 Jan 2019, Matt Riedemann wrote:
Here is my attempt at summarizing the call we had. Notes are in the etherpad [1].
Thanks for writing this up, this aligns pretty well with what I recall. Some additional notes/comments within.
Deployment tools:
* Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO.
Can you (or Dan?) clarify if spanning the release boundaries is usefully specifically for tooling that chooses to upgrade everything at once and thus is forced to run Stein nova with Stein placement? And if someone were able/willing to run Rocky nova with Stein placement (briefly) the challenges are less of a concern? I'm not asking because I disagree with the assertion, I just want to be sure I understand (and by proxy our adoring readers do as well) what "ease" really means in this context as the above bullet doesn't really explain it.
* Placement release notes will start at the current release and reference the nova release notes for anything older (Ocata->Rocky).
This is ready to go with https://review.openstack.org/#/c/631308/ and https://review.openstack.org/#/c/618708/ . Both need one more +2.
[1] https://etherpad.openstack.org/p/placement-extract-stein-5
-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 1/17/2019 6:07 AM, Chris Dent wrote:
Deployment tools:
* Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO.
Can you (or Dan?) clarify if spanning the release boundaries is usefully specifically for tooling that chooses to upgrade everything at once and thus is forced to run Stein nova with Stein placement?
And if someone were able/willing to run Rocky nova with Stein placement (briefly) the challenges are less of a concern?
I'm not asking because I disagree with the assertion, I just want to be sure I understand (and by proxy our adoring readers do as well) what "ease" really means in this context as the above bullet doesn't really explain it.
I didn't go into details on that point because honestly I also could use some written words explaining the differences for TripleO in doing the upgrade and migration in-step with the Stein upgrade versus upgrading to Stein and then upgrading to Train, and how the migration with that is any less painful. I know Dan talked about it on the call, but I can't say I followed it all well enough to be able to summarize the pros/cons (which is why I didn't in my summary email). This might already be something I know about, but the lights just aren't turning on right now. -- Thanks, Matt
On Thu, 2019-01-17 at 09:09 -0600, Matt Riedemann wrote:
On 1/17/2019 6:07 AM, Chris Dent wrote:
Deployment tools:
* Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO.
Can you (or Dan?) clarify if spanning the release boundaries is usefully specifically for tooling that chooses to upgrade everything at once and thus is forced to run Stein nova with Stein placement?
And if someone were able/willing to run Rocky nova with Stein placement (briefly) the challenges are less of a concern?
I'm not asking because I disagree with the assertion, I just want to be sure I understand (and by proxy our adoring readers do as well) what "ease" really means in this context as the above bullet doesn't really explain it.
I didn't go into details on that point because honestly I also could use some written words explaining the differences for TripleO in doing the upgrade and migration in-step with the Stein upgrade versus upgrading to Stein and then upgrading to Train, and how the migration with that is any less painful. I know Dan talked about it on the call, but I can't say I followed it all well enough to be able to summarize the pros/cons (which is why I didn't in my summary email). This might already be something I know about, but the lights just aren't turning on right now.
a general questionon this topic. is there any update on supprot for deploying and upgrading to extracted placnement with other deployment tools the main ones beyond triplo that come to mind are kolla-ansible, juju, openstack-ansible, openstack helm there are obviously others but before we remove the code in nova i assume we will want to ensure that other tools beyond devstack, grenade and tripleo can actuly deploy stien with extracted placemnt and idealy upgrade. was this covered in the placement extration meeting.
On 1/17/2019 10:32 AM, Sean Mooney wrote:
a general questionon this topic. is there any update on supprot for deploying and upgrading to extracted placnement with other deployment tools
the main ones beyond triplo that come to mind are kolla-ansible, juju, openstack-ansible, openstack helm
there are obviously others but before we remove the code in nova i assume we will want to ensure that other tools beyond devstack, grenade and tripleo can actuly deploy stien with extracted placemnt and idealy upgrade.
was this covered in the placement extration meeting.
Chris has links for this in the etherpad and mentions it in the placement update emails. Off the top of my head, I want to say kolla can deploy and is working on upgrades from source tarballs (until debs are available). OSA has a change up for install which isn't merged yet. I don't know about juju or helm. -- Thanks, Matt
On 17-01-19 11:32:55, Matt Riedemann wrote:
On 1/17/2019 10:32 AM, Sean Mooney wrote:
a general questionon this topic. is there any update on supprot for deploying and upgrading to extracted placnement with other deployment tools
the main ones beyond triplo that come to mind are kolla-ansible, juju, openstack-ansible, openstack helm
there are obviously others but before we remove the code in nova i assume we will want to ensure that other tools beyond devstack, grenade and tripleo can actuly deploy stien with extracted placemnt and idealy upgrade.
was this covered in the placement extration meeting.
Chris has links for this in the etherpad and mentions it in the placement update emails. Off the top of my head, I want to say kolla can deploy and is working on upgrades from source tarballs (until debs are available). OSA has a change up for install which isn't merged yet. I don't know about juju or helm.
I think I called this out during the meeting but only the core kolla change introducing the new placement images has landed so far. The kolla-ansible change required to deploy the extracted placement service hasn't but looks almost ready to go: WIP: Split placement from nova https://review.openstack.org/#/c/613629/ Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76
On 17-01-19 09:09:24, Matt Riedemann wrote:
On 1/17/2019 6:07 AM, Chris Dent wrote:
Deployment tools:
* Lee is working on TripleO support for extracted placement and estimates 3 more weeks for just deploy (base install) support to be done, and at least 3 more weeks for upgrade support after that. Read Lee's status update for details [2]. * If nova were to go ahead and drop placement code and require extracted placement before TripleO is ready, they would have to pin nova to a git SHA before that which would delay their Stein release. * Having the extraction span release boundaries would ease the upgrade pain for TripleO.
Can you (or Dan?) clarify if spanning the release boundaries is usefully specifically for tooling that chooses to upgrade everything at once and thus is forced to run Stein nova with Stein placement?
And if someone were able/willing to run Rocky nova with Stein placement (briefly) the challenges are less of a concern?
I'm not asking because I disagree with the assertion, I just want to be sure I understand (and by proxy our adoring readers do as well) what "ease" really means in this context as the above bullet doesn't really explain it.
I didn't go into details on that point because honestly I also could use some written words explaining the differences for TripleO in doing the upgrade and migration in-step with the Stein upgrade versus upgrading to Stein and then upgrading to Train, and how the migration with that is any less painful.
AFAIK it wouldn't make the migration itself any less painful but having an overlap release would provide additional development and validation time. Time that is currently lacking given the very late breaking way upgrades are developed by TripleO, often only stabilising after the official upstream release is out. Anyway, I think this was Dan's point here but I'm happy to be corrected.
I know Dan talked about it on the call, but I can't say I followed it all well enough to be able to summarize the pros/cons (which is why I didn't in my summary email). This might already be something I know about, but the lights just aren't turning on right now.
Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76
A couple of quick status updates below. On 1/16/2019 1:29 PM, Matt Riedemann wrote:
Nested providers / reshaper / VGPU:
* We still need someone to write a functional test which creates a server with a flat resource structure, reshapes that to nested, and then creates another server against the same provider tree.
Gibi started this: https://review.openstack.org/#/c/631559/
Data migration:
* The only placement-specific online data migration in nova is "create_incomplete_consumers" and we agreed to copy that into placement and add a placement-status upgrade check for it. The data migration code will build on top of Tetsuro's work [4]. Matt is signed up to work on both of those commands.
The create_incomplete_consumers data migration was copied to placement and is merged: https://review.openstack.org/#/c/631604/ And the upgrade check patch is up for review: https://review.openstack.org/#/c/631671/ -- Thanks, Matt
On 1/16/2019 1:29 PM, Matt Riedemann wrote:
Nested providers / reshaper / VGPU:
* Matt (me!) has agreed to rebase and address the comments on the libvirt patch [3] to try and push that forward.
Done. https://review.openstack.org/#/c/599208/ -- Thanks, Matt
participants (6)
-
Balázs Gibizer
-
Chris Dent
-
Lee Yarwood
-
Matt Riedemann
-
Sean Mooney
-
Sylvain Bauza