[nova][neutron][ptg] How to increase the minimum bandwidth guarantee of a running instance
Hi, [This is a topic from the PTG etherpad [0]. As the PTG time is intentionally kept short, let's try to discuss it or even conclude it before the PTG] As a next step in the minimum bandwidth QoS support I would like to solve the use case where a running instance has some ports with minimum bandwidth but then user wants to change (e.g. increase) the minimum bandwidth used by the instance. I see two generic ways to solve the use case: Option A - interface attach --------------------------- Attach a new port with minimum bandwidth to the instance to increase the instance's overall bandwidth guarantee. This only impacts Nova's interface attach code path: 1) The interface attach code path needs to read the port's resource request 2) Call Placement GET /allocation_candidates?in_tree=<compute RP of the instance> 3a) If placement returns candidates then select one and modify the current allocation of the instance accordingly and continue the existing interface attach code path. 3b) If placement returns no candidates then there is no free resource left on the instance's current host to resize the allocation locally. Option B - QoS rule update -------------------------- Allow changing the minimum bandwidth guarantee of a port that is already bound to the instance. Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the port has been changed and then Nova needs to call Placement and update instance's allocation. The Option A and Option B are not mutually exclusive but still I would like to see what is the preference of the community. Which direction should we move forward? Both options have the limitation that if the instance's current host does not have enough free resources for the requested change then Nova will not do a full scheduling and move the instance to another host where resource is available. This seems a hard problem to me. Do you have any idea how can we remove / ease this limitation without boiling the ocean? For example: Does it make sense to implement a bandwidth weigher in the scheduler so instances can be spread by free bandwidth during creation? Cheers, gibi [0] https://etherpad.opendev.org/p/nova-victoria-ptg
Hi, Thx for starting this thread. I can share some thoughts from the Neutron point of view. On Tue, May 19, 2020 at 04:08:18PM +0200, Balázs Gibizer wrote:
Hi,
[This is a topic from the PTG etherpad [0]. As the PTG time is intentionally kept short, let's try to discuss it or even conclude it before the PTG]
As a next step in the minimum bandwidth QoS support I would like to solve the use case where a running instance has some ports with minimum bandwidth but then user wants to change (e.g. increase) the minimum bandwidth used by the instance.
I see two generic ways to solve the use case:
Option A - interface attach ---------------------------
Attach a new port with minimum bandwidth to the instance to increase the instance's overall bandwidth guarantee.
This only impacts Nova's interface attach code path: 1) The interface attach code path needs to read the port's resource request 2) Call Placement GET /allocation_candidates?in_tree=<compute RP of the instance> 3a) If placement returns candidates then select one and modify the current allocation of the instance accordingly and continue the existing interface attach code path. 3b) If placement returns no candidates then there is no free resource left on the instance's current host to resize the allocation locally.
Option B - QoS rule update --------------------------
Allow changing the minimum bandwidth guarantee of a port that is already bound to the instance.
Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the port has been changed and then Nova needs to call Placement and update instance's allocation.
In this case, if You update QoS rule, don't forget that policy with this rule can be used by many ports already. So we will need to find all of them and call placement for each. What if that will be fine for some ports but not for all?
The Option A and Option B are not mutually exclusive but still I would like to see what is the preference of the community. Which direction should we move forward?
There is also 3rd possible option, very similar to Option B which is change of the QoS policy for the port. It's basically almost the same as Option B, but that way You have always only one port to update (unless it's not policy associated with network). So because of that reason, maybe a bit easier to do.
Both options have the limitation that if the instance's current host does not have enough free resources for the requested change then Nova will not do a full scheduling and move the instance to another host where resource is available. This seems a hard problem to me.
Do you have any idea how can we remove / ease this limitation without boiling the ocean?
For example: Does it make sense to implement a bandwidth weigher in the scheduler so instances can be spread by free bandwidth during creation?
Cheers, gibi
-- Slawek Kaplonski Senior software engineer Red Hat
On Tue, 2020-05-19 at 21:55 +0200, Slawek Kaplonski wrote:
Hi,
Thx for starting this thread. I can share some thoughts from the Neutron point of view.
On Tue, May 19, 2020 at 04:08:18PM +0200, Balázs Gibizer wrote:
Hi,
[This is a topic from the PTG etherpad [0]. As the PTG time is intentionally kept short, let's try to discuss it or even conclude it before the PTG]
As a next step in the minimum bandwidth QoS support I would like to solve the use case where a running instance has some ports with minimum bandwidth but then user wants to change (e.g. increase) the minimum bandwidth used by the instance.
I see two generic ways to solve the use case:
Option A - interface attach ---------------------------
Attach a new port with minimum bandwidth to the instance to increase the instance's overall bandwidth guarantee.
This only impacts Nova's interface attach code path: 1) The interface attach code path needs to read the port's resource request 2) Call Placement GET /allocation_candidates?in_tree=<compute RP of the instance> 3a) If placement returns candidates then select one and modify the current allocation of the instance accordingly and continue the existing interface attach code path. 3b) If placement returns no candidates then there is no free resource left on the instance's current host to resize the allocation locally. so currently we dont support attaching port with resouce request. if we were to do that i would prefer to make it more generic e.g. support attich sriov devices as well.
Option B - QoS rule update --------------------------
Allow changing the minimum bandwidth guarantee of a port that is already bound to the instance.
Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the port has been changed and then Nova needs to call Placement and update instance's allocation.
In this case, if You update QoS rule, don't forget that policy with this rule can be used by many ports already. So we will need to find all of them and call placement for each. What if that will be fine for some ports but not for all? i think if we went with a qos rule update we would not actully modify the rule itself
i dont think we should ever support this for the usecase of changing qos policies or bandwith allocations but i think this is a good feature in its own right. that would break to many thing and instead change change the qos rule that is applied to the port. e.g. if you have a 1GBps rule and and 10GBps then we could support swaping between the rules but we should not support chnaging the 1GBps rule to a 2GBps rule. neutron should ideally do the placement check and allocation update as part of the qos rule update api action and raise an exception if it could not.
The Option A and Option B are not mutually exclusive but still I would like to see what is the preference of the community. Which direction should we move forward?
There is also 3rd possible option, very similar to Option B which is change of the QoS policy for the port. It's basically almost the same as Option B, but that way You have always only one port to update (unless it's not policy associated with network). So because of that reason, maybe a bit easier to do.
yes that is what i was suggesting above and its one of the option we discused when first desigining the minium bandwith policy. this i think is the optimal solution and i dont think we should do option a or b although A could be done as a sperate feature just not as a way we recommend to update qos policies.
Both options have the limitation that if the instance's current host does not have enough free resources for the requested change then Nova will not do a full scheduling and move the instance to another host where resource is available. This seems a hard problem to me.
i honestly dont think it is we condiered this during the design of the feature with the intent of one day supporting it. option c was how i always assumed it would work. support attach and detach for port or other things with reqsouce requests is a seperate topic as it applies to gpu hotplug, sriov port and cyborg so i would ignore that for now and focuse on what is basicaly a qos resize action where we are swaping between predefiend qos policies.
Do you have any idea how can we remove / ease this limitation without boiling the ocean?
For example: Does it make sense to implement a bandwidth weigher in the scheduler so instances can be spread by free bandwidth during creation?
we discussed this in the passed breifly. i always belived that was a good idea but it would require the allocation candiates to be passed to the weigher and the provider summaries. we have other usecases that could benifit form that too but i think in the past that was see as to much work when we did not even have the basic support working yet. now i think it would be a resonable next step and as i said we will need the ability to weigh based on allcoation candiates in the future of for other feature too so this might be a nice time to intoduce that.
Cheers, gibi
On Tue, May 19, 2020 at 23:48, Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2020-05-19 at 21:55 +0200, Slawek Kaplonski wrote:
Hi,
Thx for starting this thread. I can share some thoughts from the Neutron point of view.
Hi,
[This is a topic from the PTG etherpad [0]. As the PTG time is intentionally kept short, let's try to discuss it or even conclude it before
On Tue, May 19, 2020 at 04:08:18PM +0200, Balázs Gibizer wrote: the PTG]
As a next step in the minimum bandwidth QoS support I would like
the use case where a running instance has some ports with minimum bandwidth but then user wants to change (e.g. increase) the minimum bandwidth used by the instance.
I see two generic ways to solve the use case:
Option A - interface attach ---------------------------
Attach a new port with minimum bandwidth to the instance to increase the instance's overall bandwidth guarantee.
This only impacts Nova's interface attach code path: 1) The interface attach code path needs to read the port's resource request 2) Call Placement GET /allocation_candidates?in_tree=<compute RP of the instance> 3a) If placement returns candidates then select one and modify
to solve the current
allocation of the instance accordingly and continue the existing interface attach code path. 3b) If placement returns no candidates then there is no free resource left on the instance's current host to resize the allocation locally. so currently we dont support attaching port with resouce request. if we were to do that i would prefer to make it more generic e.g. support attich sriov devices as well.
Option B - QoS rule update --------------------------
Allow changing the minimum bandwidth guarantee of a port that is
bound to the instance.
Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the
already port has been
changed and then Nova needs to call Placement and update instance's allocation.
In this case, if You update QoS rule, don't forget that policy with this rule can be used by many ports already. So we will need to find all of them and call placement for each. What if that will be fine for some ports but not for all? i think if we went with a qos rule update we would not actully modify
i dont think we should ever support this for the usecase of changing qos policies or bandwith allocations but i think this is a good feature in its own right. the rule itself that would break to many thing and instead change change the qos rule that is applied to the port.
e.g. if you have a 1GBps rule and and 10GBps then we could support swaping between the rules but we should not support chnaging the 1GBps rule to a 2GBps rule.
neutron should ideally do the placement check and allocation update as part of the qos rule update api action and raise an exception if it could not.
The Option A and Option B are not mutually exclusive but still I
would like
to see what is the preference of the community. Which direction should we move forward?
There is also 3rd possible option, very similar to Option B which is change of the QoS policy for the port. It's basically almost the same as Option B, but that way You have always only one port to update (unless it's not policy associated with network). So because of that reason, maybe a bit easier to do.
yes that is what i was suggesting above and its one of the option we discused when first desigining the minium bandwith policy. this i think is the optimal solution and i dont think we should do option a or b although A could be done as a sperate feature just not as a way we recommend to update qos policies.
My mistake. I don't want to allow changing a rule I want to allow changing which rule is assigned to a bound port. As Sean described this direction might require neutron to call GET /allocation_candidates and then update the instance allocation as a result in placement. However it would create a situation where the instance's allocation is managed both from nova and neutron.
Both options have the limitation that if the instance's current
host does
not have enough free resources for the requested change then Nova will not do a full scheduling and move the instance to another host where resource is available. This seems a hard problem to me.
i honestly dont think it is we condiered this during the design of the feature with the intent of one day supporting it. option c was how i always assumed it would work. support attach and detach for port or other things with reqsouce requests is a seperate topic as it applies to gpu hotplug, sriov port and cyborg so i would ignore that for now and focuse on what is basicaly a qos resize action where we are swaping between predefiend qos policies.
Do you have any idea how can we remove / ease this limitation
without
boiling the ocean?
For example: Does it make sense to implement a bandwidth weigher in the scheduler so instances can be spread by free bandwidth during creation? we discussed this in the passed breifly. i always belived that was a good idea but it would require the allocation candiates to be passed to the weigher and the provider summaries. we have other usecases that could benifit form that too but i think in the past that was see as to much work when we did not even have the basic support working yet. now i think it would be a resonable next step and as i said we will need the ability to weigh based on allcoation candiates in the future of for other feature too so this might be a nice time to intoduce that.
Cheers, gibi
On Wed, May 20, 2020 at 13:50, Balázs Gibizer <balazs.gibizer@est.tech> wrote:
On Tue, May 19, 2020 at 23:48, Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2020-05-19 at 21:55 +0200, Slawek Kaplonski wrote:
[snip]
Option B - QoS rule update --------------------------
Allow changing the minimum bandwidth guarantee of a port that is
already
bound to the instance.
Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the port has been changed and then Nova needs to call Placement and update instance's allocation.
In this case, if You update QoS rule, don't forget that policy with this rule can be used by many ports already. So we will need to find all of them and call placement for each. What if that will be fine for some ports but not for all? i think if we went with a qos rule update we would not actully modify the rule itself that would break to many thing and instead change change the qos rule that is applied to the port.
e.g. if you have a 1GBps rule and and 10GBps then we could support swaping between the rules but we should not support chnaging the 1GBps rule to a 2GBps rule.
neutron should ideally do the placement check and allocation update as part of the qos rule update api action and raise an exception if it could not.
The Option A and Option B are not mutually exclusive but still I
would like
to see what is the preference of the community. Which direction should we move forward?
There is also 3rd possible option, very similar to Option B which is change of the QoS policy for the port. It's basically almost the same as Option B, but that way You have always only one port to update (unless it's not policy associated with network). So because of that reason, maybe a bit easier to do.
yes that is what i was suggesting above and its one of the option we discused when first desigining the minium bandwith policy. this i think is the optimal solution and i dont think we should do option a or b although A could be done as a sperate feature just not as a way we recommend to update qos policies.
My mistake. I don't want to allow changing a rule I want to allow changing which rule is assigned to a bound port. As Sean described this direction might require neutron to call GET /allocation_candidates and then update the instance allocation as a result in placement. However it would create a situation where the instance's allocation is managed both from nova and neutron.
I've thought more about Option B (e.g. allowing to _replace_ the rule assigned to a bound port) and probably found one more general limitation. From placement perspective there could be two resource providers (RP1, RP2) on the same host that are connected to the same physnet (e.g having the same CUSTOM_PHYSNET_XXX trait). Both can have independent bandwidth inventories and both can have different bandwidth usages. Let's assume that the port is currently allocating from RP1 and then the user requests an increase of the bandwidth allocation of the port via a qos min bw rule replacement. Let's assume that RP1 does not have enough free bandwidth resource to accommodate the change but RP2 has. From placement perspective we could remove the existing bw allocation from RP1 and add the new increased bw allocation to RP2. *BUT* we cannot simply do that from networking perspective as RP1 and RP2 represents two different PFs (or OVS bridges) so allocation move would require the vif to be moved too in the networking backend. Do I understand correctly that this is a valid limitation from networking perspective? Also I would like to tease out the Neutron team's opinion about the option of implementing Option B on the neutron side. E.g.: * User request a min bw rule replacement * Neutron reads the current allocation of the port.device_id (i.e instance_uuid) from placement * Neutron calculates the difference between the bw resource request of the old min bw rule and the new min bw rule * Neutron adds this difference to the bw allocation of the RP indicated by the value of port.binding_profile['allocation'] (which is an RP uuid) and the PUTs the new instance allocation back to placement. If the PUT /allocations call succeed the the rule replacement is accepted and if the PUT /allocations fails then the rule replacement is rejected to the end user. I'm asking this because moving the instance allocation management part of the above algorithm to the nova side would require additional logic: * A new port-resource-request-changed event for os-server-external-event to notify nova about the change. This is a small inconvenient. BUT we need also * a way for neutron to provide both the old and the new resource request of the port (or the diff) so that nova can use that towards placement. Please note that the current tag field in the os-server-external-event request is only a string used to communicate the port_id so it is not really useful to carry structured data. Cheers, gibi
On Fri, May 29, 2020 at 16:29, Balázs Gibizer <balazs.gibizer@est.tech> wrote:
On Wed, May 20, 2020 at 13:50, Balázs Gibizer <balazs.gibizer@est.tech> wrote:
On Tue, May 19, 2020 at 23:48, Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2020-05-19 at 21:55 +0200, Slawek Kaplonski wrote:
[snip]
Also I would like to tease out the Neutron team's opinion about the option of implementing Option B on the neutron side. E.g.: * User request a min bw rule replacement * Neutron reads the current allocation of the port.device_id (i.e instance_uuid) from placement * Neutron calculates the difference between the bw resource request of the old min bw rule and the new min bw rule * Neutron adds this difference to the bw allocation of the RP indicated by the value of port.binding_profile['allocation'] (which is an RP uuid) and the PUTs the new instance allocation back to placement. If the PUT /allocations call succeed the the rule replacement is accepted and if the PUT /allocations fails then the rule replacement is rejected to the end user.
On the PTG we agreed that * There will be an RFE on neutron to allow in-place min bandwidth allocation change based on the above drafted sequence * Triggering a resize due to interface attach or port.resource_request change seems insane. In the future we might look at that from a different perspective. I.e. What if a resize could take a new parameter that indicates that the resize is not due to flavor change but due to bandwidth change. Cheers, gibi [snip]
Cheers, gibi
On Tue, May 19, 2020 at 23:48, Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2020-05-19 at 21:55 +0200, Slawek Kaplonski wrote:
Hi,
Thx for starting this thread. I can share some thoughts from the Neutron point of view.
Hi,
[This is a topic from the PTG etherpad [0]. As the PTG time is intentionally kept short, let's try to discuss it or even conclude it before
On Tue, May 19, 2020 at 04:08:18PM +0200, Balázs Gibizer wrote: the PTG]
As a next step in the minimum bandwidth QoS support I would like
the use case where a running instance has some ports with minimum bandwidth but then user wants to change (e.g. increase) the minimum bandwidth used by the instance.
I see two generic ways to solve the use case:
Option A - interface attach ---------------------------
Attach a new port with minimum bandwidth to the instance to increase the instance's overall bandwidth guarantee.
This only impacts Nova's interface attach code path: 1) The interface attach code path needs to read the port's resource request 2) Call Placement GET /allocation_candidates?in_tree=<compute RP of the instance> 3a) If placement returns candidates then select one and modify
to solve the current
allocation of the instance accordingly and continue the existing interface attach code path. 3b) If placement returns no candidates then there is no free resource left on the instance's current host to resize the allocation locally. so currently we dont support attaching port with resouce request. if we were to do that i would prefer to make it more generic e.g. support attich sriov devices as well.
For me supporting interface attach with resource request is a different feature from supporting interface attach with vnic_type direct or direct_physical. However supporting increasing minimum bandwidth of an instance by attaching new SRIOV ports with bigger qos rule would require both features to be implemented. So yes at the end I would need both.
Option B - QoS rule update --------------------------
Allow changing the minimum bandwidth guarantee of a port that is
bound to the instance.
Today Neutron rejects such QoS rule update. If we want to support such update then: * either Neutron should call placement allocation_candidates API and the update the instance's allocation. Similarly what Nova does in Option A. * or Neutron should tell Nova that the resource request of the
already port has been
changed and then Nova needs to call Placement and update instance's allocation.
In this case, if You update QoS rule, don't forget that policy with this rule can be used by many ports already. So we will need to find all of them and call placement for each. What if that will be fine for some ports but not for all? i think if we went with a qos rule update we would not actully modify
i dont think we should ever support this for the usecase of changing qos policies or bandwith allocations but i think this is a good feature in its own right. the rule itself that would break to many thing and instead change change the qos rule that is applied to the port.
e.g. if you have a 1GBps rule and and 10GBps then we could support swaping between the rules but we should not support chnaging the 1GBps rule to a 2GBps rule.
neutron should ideally do the placement check and allocation update as part of the qos rule update api action and raise an exception if it could not.
The Option A and Option B are not mutually exclusive but still I
would like
to see what is the preference of the community. Which direction should we move forward?
There is also 3rd possible option, very similar to Option B which is change of the QoS policy for the port. It's basically almost the same as Option B, but that way You have always only one port to update (unless it's not policy associated with network). So because of that reason, maybe a bit easier to do.
yes that is what i was suggesting above and its one of the option we discused when first desigining the minium bandwith policy. this i think is the optimal solution and i dont think we should do option a or b although A could be done as a sperate feature just not as a way we recommend to update qos policies.
Both options have the limitation that if the instance's current
host does
not have enough free resources for the requested change then Nova will not do a full scheduling and move the instance to another host where resource is available. This seems a hard problem to me.
i honestly dont think it is we condiered this during the design of the feature with the intent of one day supporting it. option c was how i always assumed it would work. support attach and detach for port or other things with reqsouce requests is a seperate topic as it applies to gpu hotplug, sriov port and cyborg so i would ignore that for now and focuse on what is basicaly a qos resize action where we are swaping between predefiend qos policies.
Do you have any idea how can we remove / ease this limitation
without
boiling the ocean?
For example: Does it make sense to implement a bandwidth weigher in the scheduler so instances can be spread by free bandwidth during creation? we discussed this in the passed breifly. i always belived that was a good idea but it would require the allocation candiates to be passed to the weigher and the provider summaries. we have other usecases that could benifit form that too but i think in the past that was see as to much work when we did not even have the basic support working yet. now i think it would be a resonable next step and as i said we will need the ability to weigh based on allcoation candiates in the future of for other feature too so this might be a nice time to intoduce that.
Cheers, gibi
participants (3)
-
Balázs Gibizer
-
Sean Mooney
-
Slawek Kaplonski