[nova][ptg] Continue QoS port support
Hi Novas! Here is my summarized plans for Ussuri about continuing the work on supporting qos neutron ports (those that has bandwidth resource request) in nova. What is missing: * Tempest coverage is missing for migrate and resize support that is merged in Train. This work is already underway and bugs has been caught [1][2] * Support evacuate, live migrate, unshelve. The work is described in [3][4] and the first set of patches for the evacuation support is up for review [5] * Support for cross cell resize with qos port needs some work. Matt prepared the cross cell resize code already in a way that no new RPC change will be needed [6] and I have a plan what to do [7]. * InstancePCIRequest persists parent_ifname during migration but the such change is not rolled back if the migration fails. This is ugly but I think it does not cause any issues [8]. I will look into this to remove the ugliness. The bandwidth support for the nova-manage heal_allocation tool was merged in Train. Originally I planned to backport that to Stein but that patch grown so big and incorporated may refactors along the way that I'm not sure any more that it is reasonable to backport it. I'm now thinking about keeping it as-is and suggesting operators to install Train nova in a virtualenv to run heal allocations for bandwidth aware servers if needed in Stein. I do have to run some manual tests to see if it actually works. Any feedback is welcome! cheers, gibi [1] https://bugs.launchpad.net/nova/+bug/1849695 [2] https://bugs.launchpad.net/nova/+bug/1849657 [3] https://blueprints.launchpad.net/nova/+spec/support-move-ops-with-qos-ports-... [4] https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/suppo... [5] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:mas... [6] https://review.opendev.org/#/c/635080/43/nova/compute/manager.py@5375 [7] https://review.opendev.org/#/c/633293/49/nova/compute/manager.py@4742 [8] https://review.opendev.org/#/c/688387/6/nova/compute/manager.py@3404
On 10/25/2019 5:17 AM, Balázs Gibizer wrote:
The bandwidth support for the nova-manage heal_allocation tool was merged in Train. Originally I planned to backport that to Stein but that patch grown so big and incorporated may refactors along the way that I'm not sure any more that it is reasonable to backport it. I'm now thinking about keeping it as-is and suggesting operators to install Train nova in a virtualenv to run heal allocations for bandwidth aware servers if needed in Stein.
I think that's reasonable. Trying to backport that to stein would be a challenge, both in you doing it and stable cores reviewing it. -- Thanks, Matt
On Fri, Oct 25, 2019 at 10:17, Balázs Gibizer <balazs.gibizer@est.tech> wrote:
Hi Novas!
Here is my summarized plans for Ussuri about continuing the work on supporting qos neutron ports (those that has bandwidth resource request) in nova.
As we are at milestone 1 I thought it would be good to summarize the progress so far.
What is missing: * Tempest coverage is missing for migrate and resize support that is merged in Train. This work is already underway and bugs has been caught [1][2]
These bugs are fixed now. But tempest patches [9][10] are still open and need reviews.
* Support evacuate, live migrate, unshelve. The work is described in [3][4] and the first set of patches for the evacuation support is up for review [5]
The evacuate support has been merged before M1. I've just finished up the last pieces of the live migration support so that is complete now but code review needs to be continued. My next step (probably after the vacation period) is to look into the unshelve support.
* Support for cross cell resize with qos port needs some work. Matt prepared the cross cell resize code already in a way that no new RPC change will be needed [6] and I have a plan what to do [7].
My goal here is to have the cross cell resize support of qos ready and proposed before M2.
* InstancePCIRequest persists parent_ifname during migration but the such change is not rolled back if the migration fails. This is ugly but I think it does not cause any issues [8]. I will look into this to remove the ugliness.
This is still open but so far did not cause any problem.
The bandwidth support for the nova-manage heal_allocation tool was merged in Train. Originally I planned to backport that to Stein but that patch grown so big and incorporated may refactors along the way that I'm not sure any more that it is reasonable to backport it. I'm now thinking about keeping it as-is and suggesting operators to install Train nova in a virtualenv to run heal allocations for bandwidth aware servers if needed in Stein. I do have to run some manual tests to see if it actually works.
As we agreed with Matt in October it is reasonable to document my proposal. I would like to run some manual test before I write that doc. In the meantime Eric made a really good progress to use the request group - resource provider mapping from placement instead of re-calculating it in nova. As [11] is merged we only have one single place where the re-calculation is still needed to be able to support revert resize. The next possible step here is transform the resize flow in nova to use the multiple neutron port binding and by that keep the mapping information in the inactive port binding on the source host. This task is on my TODO list but I'm not sure if it will fit the Ussuri timeline. Thanks everybody who helped making this nice progress! Cheers, gibi [9] https://review.opendev.org/#/c/690934 [10] https://review.opendev.org/#/c/694539 [11] https://review.opendev.org/#/c/696992
Any feedback is welcome!
cheers, gibi
[1] https://bugs.launchpad.net/nova/+bug/1849695 [2] https://bugs.launchpad.net/nova/+bug/1849657 [3] https://blueprints.launchpad.net/nova/+spec/support-move-ops-with-qos-ports-... [4] https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/suppo... [5] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:mas... [6] https://review.opendev.org/#/c/635080/43/nova/compute/manager.py@5375 [7] https://review.opendev.org/#/c/633293/49/nova/compute/manager.py@4742 [8] https://review.opendev.org/#/c/688387/6/nova/compute/manager.py@3404
On 12/14/2019 5:25 AM, Balázs Gibizer wrote:
The next possible step here is transform the resize flow in nova to use the multiple neutron port binding and by that keep the mapping information in the inactive port binding on the source host. This task is on my TODO list but I'm not sure if it will fit the Ussuri timeline.
Cross-cell resize uses multiple port bindings so maybe you can borrow from some of that code. -- Thanks, Matt
participants (2)
-
Balázs Gibizer
-
Matt Riedemann