[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations
Sylvain Bauza
sylvain.bauza at gmail.com
Tue Oct 9 15:32:12 UTC 2018
Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer <balazs.gibizer at ericsson.com> a
écrit :
>
>
> On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza <sylvain.bauza at gmail.com>
> wrote:
> >
> >
> > Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack at fried.cc> a
> > écrit :
> >> IIUC, the primary thing the force flag was intended to do - allow an
> >> instance to land on the requested destination even if that means
> >> oversubscription of the host's resources - doesn't happen anymore
> >> since
> >> we started making the destination claim in placement.
> >>
> >> IOW, since pike, you don't actually see a difference in behavior by
> >> using the force flag or not. (If you do, it's more likely a bug than
> >> what you were expecting.)
> >>
> >> So there's no reason to keep it around. We can remove it in a new
> >> microversion (or not); but even in the current microversion we need
> >> not
> >> continue making convoluted attempts to observe it.
> >>
> >> What that means is that we should simplify everything down to ignore
> >> the
> >> force flag and always call GET /a_c. Problem solved - for nested
> >> and/or
> >> sharing, NUMA or not, root resources or no, on the source and/or
> >> destination.
> >>
> >
> >
> > While I tend to agree with Eric here (and I commented on the review
> > accordingly by saying we should signal the new behaviour by a
> > microversion), I still think we need to properly advertise this,
> > adding openstack-operators@ accordingly.
>
> Question for you as well: if we remove (or change) the force flag in a
> new microversion then how should the old microversions behave when
> nested allocations would be required?
>
>
In that case (ie. old microversions with either "force=None and target" or
'force=True', we should IMHO not allocate any migration.
Thoughts ?
> Cheers,
> gibi
>
> > Disclaimer : since we have gaps on OSC, the current OSC behaviour
> > when you "openstack server live-migrate <target>" is to *force* the
> > destination by not calling the scheduler. Yeah, it sucks.
> >
> > Operators, what are the exact cases (for those running clouds newer
> > than Mitaka, ie. Newton and above) when you make use of the --force
> > option for live migration with a microversion newer or equal 2.29 ?
> > In general, even in the case of an emergency, you still want to make
> > sure you don't throw your compute under the bus by massively
> > migrating instances that would create an undetected snowball effect
> > by having this compute refusing new instances. Or are you disabling
> > the target compute service first and throw your pet instances up
> > there ?
> >
> > -Sylvain
> >
> >
> >
> >> -efried
> >>
> >> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> >> > Hi,
> >> >
> >> > Setup
> >> > -----
> >> >
> >> > nested allocation: an allocation that contains resources from one
> >> or
> >> > more nested RPs. (if you have better term for this then please
> >> suggest).
> >> >
> >> > If an instance has nested allocation it means that the compute, it
> >> > allocates from, has a nested RP tree. BUT if a compute has a
> >> nested RP
> >> > tree it does not automatically means that the instance, allocating
> >> from
> >> > that compute, has a nested allocation (e.g. bandwidth inventory
> >> will be
> >> > on a nested RPs but not every instance will require bandwidth)
> >> >
> >> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> >> > servers will have nested allocations as CPU and MEMORY inverntory
> >> will
> >> > be moved to the nested NUMA RPs. But NUMA is still in the future.
> >> >
> >> > Sidenote: there is an edge case reported by bauzas when an instance
> >> > allocates _only_ from nested RPs. This was discussed on last
> >> Friday and
> >> > it resulted in a new patch[0] but I would like to keep that
> >> discussion
> >> > separate from this if possible.
> >> >
> >> > Sidenote: the current problem somewhat related to not just nested
> >> PRs
> >> > but to sharing RPs as well. However I'm not aiming to implement
> >> sharing
> >> > support in Nova right now so I also try to keep the sharing
> >> disscussion
> >> > separated if possible.
> >> >
> >> > There was already some discussion on the Monday's scheduler
> >> meeting but
> >> > I could not attend.
> >> >
> >>
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> >> >
> >> >
> >> > The meat
> >> > --------
> >> >
> >> > Both live-migrate[1] and evacuate[2] has an optional force flag on
> >> the
> >> > nova REST API. The documentation says: "Force <the action> by not
> >> > verifying the provided destination host by the scheduler."
> >> >
> >> > Nova implements this statement by not calling the scheduler if
> >> > force=True BUT still try to manage allocations in placement.
> >> >
> >> > To have allocation on the destination host Nova blindly copies the
> >> > instance allocation from the source host to the destination host
> >> during
> >> > these operations. Nova can do that as 1) the whole allocation is
> >> > against a single RP (the compute RP) and 2) Nova knows both the
> >> source
> >> > compute RP and the destination compute RP.
> >> >
> >> > However as soon as we bring nested allocations into the picture
> >> that
> >> > blind copy will not be feasible. Possible cases
> >> > 0) The instance has non-nested allocation on the source and would
> >> need
> >> > non nested allocation on the destination. This works with blindy
> >> copy
> >> > today.
> >> > 1) The instance has a nested allocation on the source and would
> >> need a
> >> > nested allocation on the destination as well.
> >> > 2) The instance has a non-nested allocation on the source and would
> >> > need a nested allocation on the destination.
> >> > 3) The instance has a nested allocation on the source and would
> >> need a
> >> > non nested allocation on the destination.
> >> >
> >> > Nova cannot generate nested allocations easily without
> >> reimplementing
> >> > some of the placement allocation candidate (a_c) code. However I
> >> don't
> >> > like the idea of duplicating some of the a_c code in Nova.
> >> >
> >> > Nova cannot detect what kind of allocation (nested or non-nested)
> >> an
> >> > instance would need on the destination without calling placement
> >> a_c.
> >> > So knowing when to call placement is a chicken and egg problem.
> >> >
> >> > Possible solutions:
> >> > A) fail fast
> >> > ------------
> >> > 0) Nova can detect that the source allocatioin is non-nested and
> >> try
> >> > the blindy copy and it will succeed.
> >> > 1) Nova can detect that the source allocaton is nested and fail the
> >> > operation
> >> > 2) Nova only sees a non nested source allocation. Even if the dest
> >> RP
> >> > tree is nested it does not mean that the allocation will be
> >> nested. We
> >> > cannot fail fast. Nova can try the blind copy and allocate every
> >> > resources from the root RP of the destination. If the instance
> >> require
> >> > nested allocation instead the claim will fail in placement. So
> >> nova can
> >> > fail the operation a bit later than in 1).
> >> > 3) Nova can detect that the source allocation is nested and fail
> >> the
> >> > operation. However and enhanced blind copy that tries to allocation
> >> > everything from the root RP on the destinaton would have worked.
> >> >
> >> > B) Guess when to ignore the force flag and call the scheduler
> >> > -------------------------------------------------------------
> >> > 0) keep the blind copy as it works
> >> > 1) Nova detect that the source allocation is nested. Ignores the
> >> force
> >> > flag and calls the scheduler that will call placement a_c. Move
> >> > operation can succeed.
> >> > 2) Nova only sees a non nested source allocation so it will fall
> >> back
> >> > to blind copy and fails at the claim on destination.
> >> > 3) Nova detect that the source allocation is nested. Ignores the
> >> force
> >> > flag and calls the scheduler that will call placement a_c. Move
> >> > operation can succeed.
> >> >
> >> > This solution would be against the API doc that states nova does
> >> not
> >> > call the scheduler if the operation is forced. However in case of
> >> force
> >> > live-migration Nova already verifies the target host from couple of
> >> > perspective in [3].
> >> > This solution is alreay proposed for live-migrate in [4] and for
> >> > evacuate in [5] so the complexity of the solution can be seen in
> >> the
> >> > reviews.
> >> >
> >> > C) Remove the force flag from the API in a new microversion
> >> > -----------------------------------------------------------
> >> > 0)-3): all cases would call the scheduler to verify the target
> >> host and
> >> > generate the nested (or non-nested) allocation.
> >> > We would still need an agreed behavior (from A), B), D)) for the
> >> old
> >> > microversions as the todays code creates inconsistent allocation
> >> in #1)
> >> > and #3) by ignoring the resource from the nested RP.
> >> >
> >> > D) Do not manage allocations in placement for forced operation
> >> > --------------------------------------------------------------
> >> > Force flag is considered as a last resort tool for the admin to
> >> move
> >> > VMs around. The API doc has a fat warning about the danger of it.
> >> So
> >> > Nova can simply ignore resource allocation task if force=True. Nova
> >> > would delete the source allocation and does not create any
> >> allocation
> >> > on the destination host.
> >> >
> >> > This is a simple but dangerous solution but it is what the force
> >> flag
> >> > is all about, move the server against all the built in safeties.
> >> (If
> >> > the admin needs the safeties she can set force=False and still
> >> specify
> >> > the destination host)
> >> >
> >> > I'm open to any suggestions.
> >> >
> >> > Cheers,
> >> > gibi
> >> >
> >> > [0] https://review.openstack.org/#/c/608298/
> >> > [1]
> >> >
> >>
> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
> >> > [2]
> >> >
> >>
> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
> >> > [3]
> >> >
> >>
> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
> >> > [4] https://review.openstack.org/#/c/605785
> >> > [5] https://review.openstack.org/#/c/606111
> >> >
> >> >
> >> >
> >>
> __________________________________________________________________________
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >>
> >>
> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181009/ebbf6d21/attachment.html>
More information about the OpenStack-dev
mailing list