[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations
Balázs Gibizer
balazs.gibizer at ericsson.com
Tue Oct 9 09:40:24 UTC 2018
Hi,
Setup
-----
nested allocation: an allocation that contains resources from one or
more nested RPs. (if you have better term for this then please suggest).
If an instance has nested allocation it means that the compute, it
allocates from, has a nested RP tree. BUT if a compute has a nested RP
tree it does not automatically means that the instance, allocating from
that compute, has a nested allocation (e.g. bandwidth inventory will be
on a nested RPs but not every instance will require bandwidth)
Afaiu, as soon as we have NUMA modelling in place the most trivial
servers will have nested allocations as CPU and MEMORY inverntory will
be moved to the nested NUMA RPs. But NUMA is still in the future.
Sidenote: there is an edge case reported by bauzas when an instance
allocates _only_ from nested RPs. This was discussed on last Friday and
it resulted in a new patch[0] but I would like to keep that discussion
separate from this if possible.
Sidenote: the current problem somewhat related to not just nested PRs
but to sharing RPs as well. However I'm not aiming to implement sharing
support in Nova right now so I also try to keep the sharing disscussion
separated if possible.
There was already some discussion on the Monday's scheduler meeting but
I could not attend.
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
The meat
--------
Both live-migrate[1] and evacuate[2] has an optional force flag on the
nova REST API. The documentation says: "Force <the action> by not
verifying the provided destination host by the scheduler."
Nova implements this statement by not calling the scheduler if
force=True BUT still try to manage allocations in placement.
To have allocation on the destination host Nova blindly copies the
instance allocation from the source host to the destination host during
these operations. Nova can do that as 1) the whole allocation is
against a single RP (the compute RP) and 2) Nova knows both the source
compute RP and the destination compute RP.
However as soon as we bring nested allocations into the picture that
blind copy will not be feasible. Possible cases
0) The instance has non-nested allocation on the source and would need
non nested allocation on the destination. This works with blindy copy
today.
1) The instance has a nested allocation on the source and would need a
nested allocation on the destination as well.
2) The instance has a non-nested allocation on the source and would
need a nested allocation on the destination.
3) The instance has a nested allocation on the source and would need a
non nested allocation on the destination.
Nova cannot generate nested allocations easily without reimplementing
some of the placement allocation candidate (a_c) code. However I don't
like the idea of duplicating some of the a_c code in Nova.
Nova cannot detect what kind of allocation (nested or non-nested) an
instance would need on the destination without calling placement a_c.
So knowing when to call placement is a chicken and egg problem.
Possible solutions:
A) fail fast
------------
0) Nova can detect that the source allocatioin is non-nested and try
the blindy copy and it will succeed.
1) Nova can detect that the source allocaton is nested and fail the
operation
2) Nova only sees a non nested source allocation. Even if the dest RP
tree is nested it does not mean that the allocation will be nested. We
cannot fail fast. Nova can try the blind copy and allocate every
resources from the root RP of the destination. If the instance require
nested allocation instead the claim will fail in placement. So nova can
fail the operation a bit later than in 1).
3) Nova can detect that the source allocation is nested and fail the
operation. However and enhanced blind copy that tries to allocation
everything from the root RP on the destinaton would have worked.
B) Guess when to ignore the force flag and call the scheduler
-------------------------------------------------------------
0) keep the blind copy as it works
1) Nova detect that the source allocation is nested. Ignores the force
flag and calls the scheduler that will call placement a_c. Move
operation can succeed.
2) Nova only sees a non nested source allocation so it will fall back
to blind copy and fails at the claim on destination.
3) Nova detect that the source allocation is nested. Ignores the force
flag and calls the scheduler that will call placement a_c. Move
operation can succeed.
This solution would be against the API doc that states nova does not
call the scheduler if the operation is forced. However in case of force
live-migration Nova already verifies the target host from couple of
perspective in [3].
This solution is alreay proposed for live-migrate in [4] and for
evacuate in [5] so the complexity of the solution can be seen in the
reviews.
C) Remove the force flag from the API in a new microversion
-----------------------------------------------------------
0)-3): all cases would call the scheduler to verify the target host and
generate the nested (or non-nested) allocation.
We would still need an agreed behavior (from A), B), D)) for the old
microversions as the todays code creates inconsistent allocation in #1)
and #3) by ignoring the resource from the nested RP.
D) Do not manage allocations in placement for forced operation
--------------------------------------------------------------
Force flag is considered as a last resort tool for the admin to move
VMs around. The API doc has a fat warning about the danger of it. So
Nova can simply ignore resource allocation task if force=True. Nova
would delete the source allocation and does not create any allocation
on the destination host.
This is a simple but dangerous solution but it is what the force flag
is all about, move the server against all the built in safeties. (If
the admin needs the safeties she can set force=False and still specify
the destination host)
I'm open to any suggestions.
Cheers,
gibi
[0] https://review.openstack.org/#/c/608298/
[1]
https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
[2]
https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
[3]
https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
[4] https://review.openstack.org/#/c/605785
[5] https://review.openstack.org/#/c/606111
More information about the OpenStack-dev
mailing list