[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

Balázs Gibizer balazs.gibizer at ericsson.com
Tue Oct 9 15:08:57 UTC 2018



On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza <sylvain.bauza at gmail.com> 
wrote:
> 
> 
> Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack at fried.cc> a 
> écrit :
>> IIUC, the primary thing the force flag was intended to do - allow an
>> instance to land on the requested destination even if that means
>> oversubscription of the host's resources - doesn't happen anymore 
>> since
>> we started making the destination claim in placement.
>> 
>> IOW, since pike, you don't actually see a difference in behavior by
>> using the force flag or not. (If you do, it's more likely a bug than
>> what you were expecting.)
>> 
>> So there's no reason to keep it around. We can remove it in a new
>> microversion (or not); but even in the current microversion we need 
>> not
>> continue making convoluted attempts to observe it.
>> 
>> What that means is that we should simplify everything down to ignore 
>> the
>> force flag and always call GET /a_c. Problem solved - for nested 
>> and/or
>> sharing, NUMA or not, root resources or no, on the source and/or
>> destination.
>> 
> 
> 
> While I tend to agree with Eric here (and I commented on the review 
> accordingly by saying we should signal the new behaviour by a 
> microversion), I still think we need to properly advertise this, 
> adding openstack-operators@ accordingly.

Question for you as well: if we remove (or change) the force flag in a 
new microversion then how should the old microversions behave when 
nested allocations would be required?

Cheers,
gibi

> Disclaimer : since we have gaps on OSC, the current OSC behaviour 
> when you "openstack server live-migrate <target>" is to *force* the 
> destination by not calling the scheduler. Yeah, it sucks.
> 
> Operators, what are the exact cases (for those running clouds newer 
> than Mitaka, ie. Newton and above) when you make use of the --force 
> option for live migration with a microversion newer or equal 2.29 ?
> In general, even in the case of an emergency, you still want to make 
> sure you don't throw your compute under the bus by massively 
> migrating instances that would create an undetected snowball effect 
> by having this compute refusing new instances. Or are you disabling 
> the target compute service first and throw your pet instances up 
> there ?
> 
> -Sylvain
> 
> 
> 
>> -efried
>> 
>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> > Hi,
>> >
>> > Setup
>> > -----
>> >
>> > nested allocation: an allocation that contains resources from one 
>> or
>> > more nested RPs. (if you have better term for this then please 
>> suggest).
>> >
>> > If an instance has nested allocation it means that the compute, it
>> > allocates from, has a nested RP tree. BUT if a compute has a 
>> nested RP
>> > tree it does not automatically means that the instance, allocating 
>> from
>> > that compute, has a nested allocation (e.g. bandwidth inventory 
>> will be
>> > on a nested RPs but not every instance will require bandwidth)
>> >
>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>> > servers will have nested allocations as CPU and MEMORY inverntory 
>> will
>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>> >
>> > Sidenote: there is an edge case reported by bauzas when an instance
>> > allocates _only_ from nested RPs. This was discussed on last 
>> Friday and
>> > it resulted in a new patch[0] but I would like to keep that 
>> discussion
>> > separate from this if possible.
>> >
>> > Sidenote: the current problem somewhat related to not just nested 
>> PRs
>> > but to sharing RPs as well. However I'm not aiming to implement 
>> sharing
>> > support in Nova right now so I also try to keep the sharing 
>> disscussion
>> > separated if possible.
>> >
>> > There was already some discussion on the Monday's scheduler 
>> meeting but
>> > I could not attend.
>> > 
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >
>> >
>> > The meat
>> > --------
>> >
>> > Both live-migrate[1] and evacuate[2] has an optional force flag on 
>> the
>> > nova REST API. The documentation says: "Force <the action> by not
>> > verifying the provided destination host by the scheduler."
>> >
>> > Nova implements this statement by not calling the scheduler if
>> > force=True BUT still try to manage allocations in placement.
>> >
>> > To have allocation on the destination host Nova blindly copies the
>> > instance allocation from the source host to the destination host 
>> during
>> > these operations. Nova can do that as 1) the whole allocation is
>> > against a single RP (the compute RP) and 2) Nova knows both the 
>> source
>> > compute RP and the destination compute RP.
>> >
>> > However as soon as we bring nested allocations into the picture 
>> that
>> > blind copy will not be feasible. Possible cases
>> > 0) The instance has non-nested allocation on the source and would 
>> need
>> > non nested allocation on the destination. This works with blindy 
>> copy
>> > today.
>> > 1) The instance has a nested allocation on the source and would 
>> need a
>> > nested allocation on the destination as well.
>> > 2) The instance has a non-nested allocation on the source and would
>> > need a nested allocation on the destination.
>> > 3) The instance has a nested allocation on the source and would 
>> need a
>> > non nested allocation on the destination.
>> >
>> > Nova cannot generate nested allocations easily without 
>> reimplementing
>> > some of the placement allocation candidate (a_c) code. However I 
>> don't
>> > like the idea of duplicating some of the a_c code in Nova.
>> >
>> > Nova cannot detect what kind of allocation (nested or non-nested) 
>> an
>> > instance would need on the destination without calling placement 
>> a_c.
>> > So knowing when to call placement is a chicken and egg problem.
>> >
>> > Possible solutions:
>> > A) fail fast
>> > ------------
>> > 0) Nova can detect that the source allocatioin is non-nested and 
>> try
>> > the blindy copy and it will succeed.
>> > 1) Nova can detect that the source allocaton is nested and fail the
>> > operation
>> > 2) Nova only sees a non nested source allocation. Even if the dest 
>> RP
>> > tree is nested it does not mean that the allocation will be 
>> nested. We
>> > cannot fail fast. Nova can try the blind copy and allocate every
>> > resources from the root RP of the destination. If the instance 
>> require
>> > nested allocation instead the claim will fail in placement. So 
>> nova can
>> > fail the operation a bit later than in 1).
>> > 3) Nova can detect that the source allocation is nested and fail 
>> the
>> > operation. However and enhanced blind copy that tries to allocation
>> > everything from the root RP on the destinaton would have worked.
>> >
>> > B) Guess when to ignore the force flag and call the scheduler
>> > -------------------------------------------------------------
>> > 0) keep the blind copy as it works
>> > 1) Nova detect that the source allocation is nested. Ignores the 
>> force
>> > flag and calls the scheduler that will call placement a_c. Move
>> > operation can succeed.
>> > 2) Nova only sees a non nested source allocation so it will fall 
>> back
>> > to blind copy and fails at the claim on destination.
>> > 3) Nova detect that the source allocation is nested. Ignores the 
>> force
>> > flag and calls the scheduler that will call placement a_c. Move
>> > operation can succeed.
>> >
>> > This solution would be against the API doc that states nova does 
>> not
>> > call the scheduler if the operation is forced. However in case of 
>> force
>> > live-migration Nova already verifies the target host from couple of
>> > perspective in [3].
>> > This solution is alreay proposed for live-migrate in [4] and for
>> > evacuate in [5] so the complexity of the solution can be seen in 
>> the
>> > reviews.
>> >
>> > C) Remove the force flag from the API in a new microversion
>> > -----------------------------------------------------------
>> > 0)-3): all cases would call the scheduler to verify the target 
>> host and
>> > generate the nested (or non-nested) allocation.
>> > We would still need an agreed behavior (from A), B), D)) for the 
>> old
>> > microversions as the todays code creates inconsistent allocation 
>> in #1)
>> > and #3) by ignoring the resource from the nested RP.
>> >
>> > D) Do not manage allocations in placement for forced operation
>> > --------------------------------------------------------------
>> > Force flag is considered as a last resort tool for the admin to 
>> move
>> > VMs around. The API doc has a fat warning about the danger of it. 
>> So
>> > Nova can simply ignore resource allocation task if force=True. Nova
>> > would delete the source allocation and does not create any 
>> allocation
>> > on the destination host.
>> >
>> > This is a simple but dangerous solution but it is what the force 
>> flag
>> > is all about, move the server against all the built in safeties. 
>> (If
>> > the admin needs the safeties she can set force=False and still 
>> specify
>> > the destination host)
>> >
>> > I'm open to any suggestions.
>> >
>> > Cheers,
>> > gibi
>> >
>> > [0] https://review.openstack.org/#/c/608298/
>> > [1]
>> > 
>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
>> > [2]
>> > 
>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
>> > [3]
>> > 
>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
>> > [4] https://review.openstack.org/#/c/605785
>> > [5] https://review.openstack.org/#/c/606111
>> >
>> >
>> > 
>> __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list