[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations
Balázs Gibizer
balazs.gibizer at ericsson.com
Tue Oct 9 15:44:40 UTC 2018
On Tue, Oct 9, 2018 at 5:32 PM, Sylvain Bauza <sylvain.bauza at gmail.com>
wrote:
>
>
> Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer
> <balazs.gibizer at ericsson.com> a écrit :
>>
>>
>> On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza
>> <sylvain.bauza at gmail.com>
>> wrote:
>> >
>> >
>> > Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack at fried.cc> a
>> > écrit :
>> >> IIUC, the primary thing the force flag was intended to do - allow
>> an
>> >> instance to land on the requested destination even if that means
>> >> oversubscription of the host's resources - doesn't happen anymore
>> >> since
>> >> we started making the destination claim in placement.
>> >>
>> >> IOW, since pike, you don't actually see a difference in behavior
>> by
>> >> using the force flag or not. (If you do, it's more likely a bug
>> than
>> >> what you were expecting.)
>> >>
>> >> So there's no reason to keep it around. We can remove it in a new
>> >> microversion (or not); but even in the current microversion we
>> need
>> >> not
>> >> continue making convoluted attempts to observe it.
>> >>
>> >> What that means is that we should simplify everything down to
>> ignore
>> >> the
>> >> force flag and always call GET /a_c. Problem solved - for nested
>> >> and/or
>> >> sharing, NUMA or not, root resources or no, on the source and/or
>> >> destination.
>> >>
>> >
>> >
>> > While I tend to agree with Eric here (and I commented on the review
>> > accordingly by saying we should signal the new behaviour by a
>> > microversion), I still think we need to properly advertise this,
>> > adding openstack-operators@ accordingly.
>>
>> Question for you as well: if we remove (or change) the force flag in
>> a
>> new microversion then how should the old microversions behave when
>> nested allocations would be required?
>>
>
> In that case (ie. old microversions with either "force=None and
> target" or 'force=True', we should IMHO not allocate any migration.
> Thoughts ?
Do you mean on old microversions implement option #D) ?
Cheers,
gibi
>
>> Cheers,
>> gibi
>>
>> > Disclaimer : since we have gaps on OSC, the current OSC behaviour
>> > when you "openstack server live-migrate <target>" is to *force* the
>> > destination by not calling the scheduler. Yeah, it sucks.
>> >
>> > Operators, what are the exact cases (for those running clouds newer
>> > than Mitaka, ie. Newton and above) when you make use of the --force
>> > option for live migration with a microversion newer or equal 2.29 ?
>> > In general, even in the case of an emergency, you still want to
>> make
>> > sure you don't throw your compute under the bus by massively
>> > migrating instances that would create an undetected snowball effect
>> > by having this compute refusing new instances. Or are you disabling
>> > the target compute service first and throw your pet instances up
>> > there ?
>> >
>> > -Sylvain
>> >
>> >
>> >
>> >> -efried
>> >>
>> >> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> >> > Hi,
>> >> >
>> >> > Setup
>> >> > -----
>> >> >
>> >> > nested allocation: an allocation that contains resources from
>> one
>> >> or
>> >> > more nested RPs. (if you have better term for this then please
>> >> suggest).
>> >> >
>> >> > If an instance has nested allocation it means that the compute,
>> it
>> >> > allocates from, has a nested RP tree. BUT if a compute has a
>> >> nested RP
>> >> > tree it does not automatically means that the instance,
>> allocating
>> >> from
>> >> > that compute, has a nested allocation (e.g. bandwidth inventory
>> >> will be
>> >> > on a nested RPs but not every instance will require bandwidth)
>> >> >
>> >> > Afaiu, as soon as we have NUMA modelling in place the most
>> trivial
>> >> > servers will have nested allocations as CPU and MEMORY
>> inverntory
>> >> will
>> >> > be moved to the nested NUMA RPs. But NUMA is still in the
>> future.
>> >> >
>> >> > Sidenote: there is an edge case reported by bauzas when an
>> instance
>> >> > allocates _only_ from nested RPs. This was discussed on last
>> >> Friday and
>> >> > it resulted in a new patch[0] but I would like to keep that
>> >> discussion
>> >> > separate from this if possible.
>> >> >
>> >> > Sidenote: the current problem somewhat related to not just
>> nested
>> >> PRs
>> >> > but to sharing RPs as well. However I'm not aiming to implement
>> >> sharing
>> >> > support in Nova right now so I also try to keep the sharing
>> >> disscussion
>> >> > separated if possible.
>> >> >
>> >> > There was already some discussion on the Monday's scheduler
>> >> meeting but
>> >> > I could not attend.
>> >> >
>> >>
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >> >
>> >> >
>> >> > The meat
>> >> > --------
>> >> >
>> >> > Both live-migrate[1] and evacuate[2] has an optional force flag
>> on
>> >> the
>> >> > nova REST API. The documentation says: "Force <the action> by
>> not
>> >> > verifying the provided destination host by the scheduler."
>> >> >
>> >> > Nova implements this statement by not calling the scheduler if
>> >> > force=True BUT still try to manage allocations in placement.
>> >> >
>> >> > To have allocation on the destination host Nova blindly copies
>> the
>> >> > instance allocation from the source host to the destination host
>> >> during
>> >> > these operations. Nova can do that as 1) the whole allocation is
>> >> > against a single RP (the compute RP) and 2) Nova knows both the
>> >> source
>> >> > compute RP and the destination compute RP.
>> >> >
>> >> > However as soon as we bring nested allocations into the picture
>> >> that
>> >> > blind copy will not be feasible. Possible cases
>> >> > 0) The instance has non-nested allocation on the source and
>> would
>> >> need
>> >> > non nested allocation on the destination. This works with blindy
>> >> copy
>> >> > today.
>> >> > 1) The instance has a nested allocation on the source and would
>> >> need a
>> >> > nested allocation on the destination as well.
>> >> > 2) The instance has a non-nested allocation on the source and
>> would
>> >> > need a nested allocation on the destination.
>> >> > 3) The instance has a nested allocation on the source and would
>> >> need a
>> >> > non nested allocation on the destination.
>> >> >
>> >> > Nova cannot generate nested allocations easily without
>> >> reimplementing
>> >> > some of the placement allocation candidate (a_c) code. However I
>> >> don't
>> >> > like the idea of duplicating some of the a_c code in Nova.
>> >> >
>> >> > Nova cannot detect what kind of allocation (nested or
>> non-nested)
>> >> an
>> >> > instance would need on the destination without calling placement
>> >> a_c.
>> >> > So knowing when to call placement is a chicken and egg problem.
>> >> >
>> >> > Possible solutions:
>> >> > A) fail fast
>> >> > ------------
>> >> > 0) Nova can detect that the source allocatioin is non-nested and
>> >> try
>> >> > the blindy copy and it will succeed.
>> >> > 1) Nova can detect that the source allocaton is nested and fail
>> the
>> >> > operation
>> >> > 2) Nova only sees a non nested source allocation. Even if the
>> dest
>> >> RP
>> >> > tree is nested it does not mean that the allocation will be
>> >> nested. We
>> >> > cannot fail fast. Nova can try the blind copy and allocate every
>> >> > resources from the root RP of the destination. If the instance
>> >> require
>> >> > nested allocation instead the claim will fail in placement. So
>> >> nova can
>> >> > fail the operation a bit later than in 1).
>> >> > 3) Nova can detect that the source allocation is nested and fail
>> >> the
>> >> > operation. However and enhanced blind copy that tries to
>> allocation
>> >> > everything from the root RP on the destinaton would have worked.
>> >> >
>> >> > B) Guess when to ignore the force flag and call the scheduler
>> >> > -------------------------------------------------------------
>> >> > 0) keep the blind copy as it works
>> >> > 1) Nova detect that the source allocation is nested. Ignores the
>> >> force
>> >> > flag and calls the scheduler that will call placement a_c. Move
>> >> > operation can succeed.
>> >> > 2) Nova only sees a non nested source allocation so it will fall
>> >> back
>> >> > to blind copy and fails at the claim on destination.
>> >> > 3) Nova detect that the source allocation is nested. Ignores the
>> >> force
>> >> > flag and calls the scheduler that will call placement a_c. Move
>> >> > operation can succeed.
>> >> >
>> >> > This solution would be against the API doc that states nova does
>> >> not
>> >> > call the scheduler if the operation is forced. However in case
>> of
>> >> force
>> >> > live-migration Nova already verifies the target host from
>> couple of
>> >> > perspective in [3].
>> >> > This solution is alreay proposed for live-migrate in [4] and for
>> >> > evacuate in [5] so the complexity of the solution can be seen in
>> >> the
>> >> > reviews.
>> >> >
>> >> > C) Remove the force flag from the API in a new microversion
>> >> > -----------------------------------------------------------
>> >> > 0)-3): all cases would call the scheduler to verify the target
>> >> host and
>> >> > generate the nested (or non-nested) allocation.
>> >> > We would still need an agreed behavior (from A), B), D)) for the
>> >> old
>> >> > microversions as the todays code creates inconsistent allocation
>> >> in #1)
>> >> > and #3) by ignoring the resource from the nested RP.
>> >> >
>> >> > D) Do not manage allocations in placement for forced operation
>> >> > --------------------------------------------------------------
>> >> > Force flag is considered as a last resort tool for the admin to
>> >> move
>> >> > VMs around. The API doc has a fat warning about the danger of
>> it.
>> >> So
>> >> > Nova can simply ignore resource allocation task if force=True.
>> Nova
>> >> > would delete the source allocation and does not create any
>> >> allocation
>> >> > on the destination host.
>> >> >
>> >> > This is a simple but dangerous solution but it is what the force
>> >> flag
>> >> > is all about, move the server against all the built in safeties.
>> >> (If
>> >> > the admin needs the safeties she can set force=False and still
>> >> specify
>> >> > the destination host)
>> >> >
>> >> > I'm open to any suggestions.
>> >> >
>> >> > Cheers,
>> >> > gibi
>> >> >
>> >> > [0] https://review.openstack.org/#/c/608298/
>> >> > [1]
>> >> >
>> >>
>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
>> >> > [2]
>> >> >
>> >>
>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
>> >> > [3]
>> >> >
>> >>
>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
>> >> > [4] https://review.openstack.org/#/c/605785
>> >> > [5] https://review.openstack.org/#/c/606111
>> >> >
>> >> >
>> >> >
>> >>
>> __________________________________________________________________________
>> >> > OpenStack Development Mailing List (not for usage questions)
>> >> > Unsubscribe:
>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >> >
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >> >
>> >>
>> >>
>> __________________________________________________________________________
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list