[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

Bal√°zs Gibizer balazs.gibizer at ericsson.com
Tue Oct 9 09:40:24 UTC 2018


Hi,

Setup
-----

nested allocation: an allocation that contains resources from one or 
more nested RPs. (if you have better term for this then please suggest).

If an instance has nested allocation it means that the compute, it 
allocates from, has a nested RP tree. BUT if a compute has a nested RP 
tree it does not automatically means that the instance, allocating from 
that compute, has a nested allocation (e.g. bandwidth inventory will be 
on a nested RPs but not every instance will require bandwidth)

Afaiu, as soon as we have NUMA modelling in place the most trivial 
servers will have nested allocations as CPU and MEMORY inverntory will 
be moved to the nested NUMA RPs. But NUMA is still in the future.

Sidenote: there is an edge case reported by bauzas when an instance 
allocates _only_ from nested RPs. This was discussed on last Friday and 
it resulted in a new patch[0] but I would like to keep that discussion 
separate from this if possible.

Sidenote: the current problem somewhat related to not just nested PRs 
but to sharing RPs as well. However I'm not aiming to implement sharing 
support in Nova right now so I also try to keep the sharing disscussion 
separated if possible.

There was already some discussion on the Monday's scheduler meeting but 
I could not attend.
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20


The meat
--------

Both live-migrate[1] and evacuate[2] has an optional force flag on the 
nova REST API. The documentation says: "Force <the action> by not 
verifying the provided destination host by the scheduler."

Nova implements this statement by not calling the scheduler if 
force=True BUT still try to manage allocations in placement.

To have allocation on the destination host Nova blindly copies the 
instance allocation from the source host to the destination host during 
these operations. Nova can do that as 1) the whole allocation is 
against a single RP (the compute RP) and 2) Nova knows both the source 
compute RP and the destination compute RP.

However as soon as we bring nested allocations into the picture that 
blind copy will not be feasible. Possible cases
0) The instance has non-nested allocation on the source and would need 
non nested allocation on the destination. This works with blindy copy 
today.
1) The instance has a nested allocation on the source and would need a 
nested allocation on the destination as well.
2) The instance has a non-nested allocation on the source and would 
need a nested allocation on the destination.
3) The instance has a nested allocation on the source and would need a 
non nested allocation on the destination.

Nova cannot generate nested allocations easily without reimplementing 
some of the placement allocation candidate (a_c) code. However I don't 
like the idea of duplicating some of the a_c code in Nova.

Nova cannot detect what kind of allocation (nested or non-nested) an 
instance would need on the destination without calling placement a_c. 
So knowing when to call placement is a chicken and egg problem.

Possible solutions:
A) fail fast
------------
0) Nova can detect that the source allocatioin is non-nested and try 
the blindy copy and it will succeed.
1) Nova can detect that the source allocaton is nested and fail the 
operation
2) Nova only sees a non nested source allocation. Even if the dest RP 
tree is nested it does not mean that the allocation will be nested. We 
cannot fail fast. Nova can try the blind copy and allocate every 
resources from the root RP of the destination. If the instance require 
nested allocation instead the claim will fail in placement. So nova can 
fail the operation a bit later than in 1).
3) Nova can detect that the source allocation is nested and fail the 
operation. However and enhanced blind copy that tries to allocation 
everything from the root RP on the destinaton would have worked.

B) Guess when to ignore the force flag and call the scheduler
-------------------------------------------------------------
0) keep the blind copy as it works
1) Nova detect that the source allocation is nested. Ignores the force 
flag and calls the scheduler that will call placement a_c. Move 
operation can succeed.
2) Nova only sees a non nested source allocation so it will fall back 
to blind copy and fails at the claim on destination.
3) Nova detect that the source allocation is nested. Ignores the force 
flag and calls the scheduler that will call placement a_c. Move 
operation can succeed.

This solution would be against the API doc that states nova does not 
call the scheduler if the operation is forced. However in case of force 
live-migration Nova already verifies the target host from couple of 
perspective in [3].
This solution is alreay proposed for live-migrate in [4] and for 
evacuate in [5] so the complexity of the solution can be seen in the 
reviews.

C) Remove the force flag from the API in a new microversion
-----------------------------------------------------------
0)-3): all cases would call the scheduler to verify the target host and 
generate the nested (or non-nested) allocation.
We would still need an agreed behavior (from A), B), D)) for the old 
microversions as the todays code creates inconsistent allocation in #1) 
and #3) by ignoring the resource from the nested RP.

D) Do not manage allocations in placement for forced operation
--------------------------------------------------------------
Force flag is considered as a last resort tool for the admin to move 
VMs around. The API doc has a fat warning about the danger of it. So 
Nova can simply ignore resource allocation task if force=True. Nova 
would delete the source allocation and does not create any allocation 
on the destination host.

This is a simple but dangerous solution but it is what the force flag 
is all about, move the server against all the built in safeties. (If 
the admin needs the safeties she can set force=False and still specify 
the destination host)

I'm open to any suggestions.

Cheers,
gibi

[0] https://review.openstack.org/#/c/608298/
[1] 
https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
[2] 
https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
[3] 
https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
[4] https://review.openstack.org/#/c/605785
[5] https://review.openstack.org/#/c/606111




More information about the OpenStack-dev mailing list