[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

Balázs Gibizer balazs.gibizer at ericsson.com
Tue Oct 9 15:04:23 UTC 2018



On Tue, Oct 9, 2018 at 4:39 PM, Eric Fried <openstack at fried.cc> wrote:
> IIUC, the primary thing the force flag was intended to do - allow an
> instance to land on the requested destination even if that means
> oversubscription of the host's resources - doesn't happen anymore 
> since
> we started making the destination claim in placement.

Can we simply do that still by not creating allocation in placement 
during the move? (see option #D))

> 
> IOW, since pike, you don't actually see a difference in behavior by
> using the force flag or not. (If you do, it's more likely a bug than
> what you were expecting.)

There is still difference between force=True and force=False today. 
When you say force=False nova calls placement a_c and placement try to 
satisfy requested resource, required traits, and aggregate membership. 
When you say force=True nova conductor takes the resource allocation 
from the source host and copies that blindly to the destination but 
does not check any traits or aggregate membership. So force=True is 
still ignores a lot of rules and safeties.

> 
> So there's no reason to keep it around. We can remove it in a new
> microversion (or not); but even in the current microversion we need 
> not
> continue making convoluted attempts to observe it.

If we remove it in a new microversion (option #C)) then we still need 
to define how to behave in the old microversions when nested allocation 
would be needed. I don't fully get what you mean by 'not continue 
making convoluted attempts to observe it.'

> 
> What that means is that we should simplify everything down to ignore 
> the
> force flag and always call GET /a_c. Problem solved - for nested 
> and/or
> sharing, NUMA or not, root resources or no, on the source and/or
> destination.

If you do the force flag removal in a nw microversion that also means 
(at least to me) that you should not change the behavior of the force 
flag in the old microversions.

Cheers,
gibi

> 
> -efried
> 
> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>>  Hi,
>> 
>>  Setup
>>  -----
>> 
>>  nested allocation: an allocation that contains resources from one or
>>  more nested RPs. (if you have better term for this then please 
>> suggest).
>> 
>>  If an instance has nested allocation it means that the compute, it
>>  allocates from, has a nested RP tree. BUT if a compute has a nested 
>> RP
>>  tree it does not automatically means that the instance, allocating 
>> from
>>  that compute, has a nested allocation (e.g. bandwidth inventory 
>> will be
>>  on a nested RPs but not every instance will require bandwidth)
>> 
>>  Afaiu, as soon as we have NUMA modelling in place the most trivial
>>  servers will have nested allocations as CPU and MEMORY inverntory 
>> will
>>  be moved to the nested NUMA RPs. But NUMA is still in the future.
>> 
>>  Sidenote: there is an edge case reported by bauzas when an instance
>>  allocates _only_ from nested RPs. This was discussed on last Friday 
>> and
>>  it resulted in a new patch[0] but I would like to keep that 
>> discussion
>>  separate from this if possible.
>> 
>>  Sidenote: the current problem somewhat related to not just nested 
>> PRs
>>  but to sharing RPs as well. However I'm not aiming to implement 
>> sharing
>>  support in Nova right now so I also try to keep the sharing 
>> disscussion
>>  separated if possible.
>> 
>>  There was already some discussion on the Monday's scheduler meeting 
>> but
>>  I could not attend.
>>  
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> 
>> 
>>  The meat
>>  --------
>> 
>>  Both live-migrate[1] and evacuate[2] has an optional force flag on 
>> the
>>  nova REST API. The documentation says: "Force <the action> by not
>>  verifying the provided destination host by the scheduler."
>> 
>>  Nova implements this statement by not calling the scheduler if
>>  force=True BUT still try to manage allocations in placement.
>> 
>>  To have allocation on the destination host Nova blindly copies the
>>  instance allocation from the source host to the destination host 
>> during
>>  these operations. Nova can do that as 1) the whole allocation is
>>  against a single RP (the compute RP) and 2) Nova knows both the 
>> source
>>  compute RP and the destination compute RP.
>> 
>>  However as soon as we bring nested allocations into the picture that
>>  blind copy will not be feasible. Possible cases
>>  0) The instance has non-nested allocation on the source and would 
>> need
>>  non nested allocation on the destination. This works with blindy 
>> copy
>>  today.
>>  1) The instance has a nested allocation on the source and would 
>> need a
>>  nested allocation on the destination as well.
>>  2) The instance has a non-nested allocation on the source and would
>>  need a nested allocation on the destination.
>>  3) The instance has a nested allocation on the source and would 
>> need a
>>  non nested allocation on the destination.
>> 
>>  Nova cannot generate nested allocations easily without 
>> reimplementing
>>  some of the placement allocation candidate (a_c) code. However I 
>> don't
>>  like the idea of duplicating some of the a_c code in Nova.
>> 
>>  Nova cannot detect what kind of allocation (nested or non-nested) an
>>  instance would need on the destination without calling placement 
>> a_c.
>>  So knowing when to call placement is a chicken and egg problem.
>> 
>>  Possible solutions:
>>  A) fail fast
>>  ------------
>>  0) Nova can detect that the source allocatioin is non-nested and try
>>  the blindy copy and it will succeed.
>>  1) Nova can detect that the source allocaton is nested and fail the
>>  operation
>>  2) Nova only sees a non nested source allocation. Even if the dest 
>> RP
>>  tree is nested it does not mean that the allocation will be nested. 
>> We
>>  cannot fail fast. Nova can try the blind copy and allocate every
>>  resources from the root RP of the destination. If the instance 
>> require
>>  nested allocation instead the claim will fail in placement. So nova 
>> can
>>  fail the operation a bit later than in 1).
>>  3) Nova can detect that the source allocation is nested and fail the
>>  operation. However and enhanced blind copy that tries to allocation
>>  everything from the root RP on the destinaton would have worked.
>> 
>>  B) Guess when to ignore the force flag and call the scheduler
>>  -------------------------------------------------------------
>>  0) keep the blind copy as it works
>>  1) Nova detect that the source allocation is nested. Ignores the 
>> force
>>  flag and calls the scheduler that will call placement a_c. Move
>>  operation can succeed.
>>  2) Nova only sees a non nested source allocation so it will fall 
>> back
>>  to blind copy and fails at the claim on destination.
>>  3) Nova detect that the source allocation is nested. Ignores the 
>> force
>>  flag and calls the scheduler that will call placement a_c. Move
>>  operation can succeed.
>> 
>>  This solution would be against the API doc that states nova does not
>>  call the scheduler if the operation is forced. However in case of 
>> force
>>  live-migration Nova already verifies the target host from couple of
>>  perspective in [3].
>>  This solution is alreay proposed for live-migrate in [4] and for
>>  evacuate in [5] so the complexity of the solution can be seen in the
>>  reviews.
>> 
>>  C) Remove the force flag from the API in a new microversion
>>  -----------------------------------------------------------
>>  0)-3): all cases would call the scheduler to verify the target host 
>> and
>>  generate the nested (or non-nested) allocation.
>>  We would still need an agreed behavior (from A), B), D)) for the old
>>  microversions as the todays code creates inconsistent allocation in 
>> #1)
>>  and #3) by ignoring the resource from the nested RP.
>> 
>>  D) Do not manage allocations in placement for forced operation
>>  --------------------------------------------------------------
>>  Force flag is considered as a last resort tool for the admin to move
>>  VMs around. The API doc has a fat warning about the danger of it. So
>>  Nova can simply ignore resource allocation task if force=True. Nova
>>  would delete the source allocation and does not create any 
>> allocation
>>  on the destination host.
>> 
>>  This is a simple but dangerous solution but it is what the force 
>> flag
>>  is all about, move the server against all the built in safeties. (If
>>  the admin needs the safeties she can set force=False and still 
>> specify
>>  the destination host)
>> 
>>  I'm open to any suggestions.
>> 
>>  Cheers,
>>  gibi
>> 
>>  [0] https://review.openstack.org/#/c/608298/
>>  [1]
>>  
>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
>>  [2]
>>  
>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
>>  [3]
>>  
>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
>>  [4] https://review.openstack.org/#/c/605785
>>  [5] https://review.openstack.org/#/c/606111
>> 
>> 
>>  
>> __________________________________________________________________________
>>  OpenStack Development Mailing List (not for usage questions)
>>  Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list