[openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations
Sylvain Bauza
sbauza at redhat.com
Tue Oct 9 15:35:09 UTC 2018
> Shit, I forgot to add openstack-operators at ...
> Operators, see my question for you here :
>
>
>> Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack at fried.cc> a écrit :
>>
>>> IIUC, the primary thing the force flag was intended to do - allow an
>>> instance to land on the requested destination even if that means
>>> oversubscription of the host's resources - doesn't happen anymore since
>>> we started making the destination claim in placement.
>>>
>>> IOW, since pike, you don't actually see a difference in behavior by
>>> using the force flag or not. (If you do, it's more likely a bug than
>>> what you were expecting.)
>>>
>>> So there's no reason to keep it around. We can remove it in a new
>>> microversion (or not); but even in the current microversion we need not
>>> continue making convoluted attempts to observe it.
>>>
>>> What that means is that we should simplify everything down to ignore the
>>> force flag and always call GET /a_c. Problem solved - for nested and/or
>>> sharing, NUMA or not, root resources or no, on the source and/or
>>> destination.
>>>
>>>
>>
>> While I tend to agree with Eric here (and I commented on the review
>> accordingly by saying we should signal the new behaviour by a
>> microversion), I still think we need to properly advertise this, adding
>> openstack-operators@ accordingly.
>> Disclaimer : since we have gaps on OSC, the current OSC behaviour when
>> you "openstack server live-migrate <target>" is to *force* the destination
>> by not calling the scheduler. Yeah, it sucks.
>>
>> Operators, what are the exact cases (for those running clouds newer than
>> Mitaka, ie. Newton and above) when you make use of the --force option for
>> live migration with a microversion newer or equal 2.29 ?
>> In general, even in the case of an emergency, you still want to make sure
>> you don't throw your compute under the bus by massively migrating instances
>> that would create an undetected snowball effect by having this compute
>> refusing new instances. Or are you disabling the target compute service
>> first and throw your pet instances up there ?
>>
>> -Sylvain
>>
>>
>>
>> -efried
>>>
>>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>>> > Hi,
>>> >
>>> > Setup
>>> > -----
>>> >
>>> > nested allocation: an allocation that contains resources from one or
>>> > more nested RPs. (if you have better term for this then please
>>> suggest).
>>> >
>>> > If an instance has nested allocation it means that the compute, it
>>> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
>>> > tree it does not automatically means that the instance, allocating
>>> from
>>> > that compute, has a nested allocation (e.g. bandwidth inventory will
>>> be
>>> > on a nested RPs but not every instance will require bandwidth)
>>> >
>>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>>> > servers will have nested allocations as CPU and MEMORY inverntory will
>>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>>> >
>>> > Sidenote: there is an edge case reported by bauzas when an instance
>>> > allocates _only_ from nested RPs. This was discussed on last Friday
>>> and
>>> > it resulted in a new patch[0] but I would like to keep that discussion
>>> > separate from this if possible.
>>> >
>>> > Sidenote: the current problem somewhat related to not just nested PRs
>>> > but to sharing RPs as well. However I'm not aiming to implement
>>> sharing
>>> > support in Nova right now so I also try to keep the sharing
>>> disscussion
>>> > separated if possible.
>>> >
>>> > There was already some discussion on the Monday's scheduler meeting
>>> but
>>> > I could not attend.
>>> >
>>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>>> >
>>> >
>>> > The meat
>>> > --------
>>> >
>>> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
>>> > nova REST API. The documentation says: "Force <the action> by not
>>> > verifying the provided destination host by the scheduler."
>>> >
>>> > Nova implements this statement by not calling the scheduler if
>>> > force=True BUT still try to manage allocations in placement.
>>> >
>>> > To have allocation on the destination host Nova blindly copies the
>>> > instance allocation from the source host to the destination host
>>> during
>>> > these operations. Nova can do that as 1) the whole allocation is
>>> > against a single RP (the compute RP) and 2) Nova knows both the source
>>> > compute RP and the destination compute RP.
>>> >
>>> > However as soon as we bring nested allocations into the picture that
>>> > blind copy will not be feasible. Possible cases
>>> > 0) The instance has non-nested allocation on the source and would need
>>> > non nested allocation on the destination. This works with blindy copy
>>> > today.
>>> > 1) The instance has a nested allocation on the source and would need a
>>> > nested allocation on the destination as well.
>>> > 2) The instance has a non-nested allocation on the source and would
>>> > need a nested allocation on the destination.
>>> > 3) The instance has a nested allocation on the source and would need a
>>> > non nested allocation on the destination.
>>> >
>>> > Nova cannot generate nested allocations easily without reimplementing
>>> > some of the placement allocation candidate (a_c) code. However I don't
>>> > like the idea of duplicating some of the a_c code in Nova.
>>> >
>>> > Nova cannot detect what kind of allocation (nested or non-nested) an
>>> > instance would need on the destination without calling placement a_c.
>>> > So knowing when to call placement is a chicken and egg problem.
>>> >
>>> > Possible solutions:
>>> > A) fail fast
>>> > ------------
>>> > 0) Nova can detect that the source allocatioin is non-nested and try
>>> > the blindy copy and it will succeed.
>>> > 1) Nova can detect that the source allocaton is nested and fail the
>>> > operation
>>> > 2) Nova only sees a non nested source allocation. Even if the dest RP
>>> > tree is nested it does not mean that the allocation will be nested. We
>>> > cannot fail fast. Nova can try the blind copy and allocate every
>>> > resources from the root RP of the destination. If the instance require
>>> > nested allocation instead the claim will fail in placement. So nova
>>> can
>>> > fail the operation a bit later than in 1).
>>> > 3) Nova can detect that the source allocation is nested and fail the
>>> > operation. However and enhanced blind copy that tries to allocation
>>> > everything from the root RP on the destinaton would have worked.
>>> >
>>> > B) Guess when to ignore the force flag and call the scheduler
>>> > -------------------------------------------------------------
>>> > 0) keep the blind copy as it works
>>> > 1) Nova detect that the source allocation is nested. Ignores the force
>>> > flag and calls the scheduler that will call placement a_c. Move
>>> > operation can succeed.
>>> > 2) Nova only sees a non nested source allocation so it will fall back
>>> > to blind copy and fails at the claim on destination.
>>> > 3) Nova detect that the source allocation is nested. Ignores the force
>>> > flag and calls the scheduler that will call placement a_c. Move
>>> > operation can succeed.
>>> >
>>> > This solution would be against the API doc that states nova does not
>>> > call the scheduler if the operation is forced. However in case of
>>> force
>>> > live-migration Nova already verifies the target host from couple of
>>> > perspective in [3].
>>> > This solution is alreay proposed for live-migrate in [4] and for
>>> > evacuate in [5] so the complexity of the solution can be seen in the
>>> > reviews.
>>> >
>>> > C) Remove the force flag from the API in a new microversion
>>> > -----------------------------------------------------------
>>> > 0)-3): all cases would call the scheduler to verify the target host
>>> and
>>> > generate the nested (or non-nested) allocation.
>>> > We would still need an agreed behavior (from A), B), D)) for the old
>>> > microversions as the todays code creates inconsistent allocation in
>>> #1)
>>> > and #3) by ignoring the resource from the nested RP.
>>> >
>>> > D) Do not manage allocations in placement for forced operation
>>> > --------------------------------------------------------------
>>> > Force flag is considered as a last resort tool for the admin to move
>>> > VMs around. The API doc has a fat warning about the danger of it. So
>>> > Nova can simply ignore resource allocation task if force=True. Nova
>>> > would delete the source allocation and does not create any allocation
>>> > on the destination host.
>>> >
>>> > This is a simple but dangerous solution but it is what the force flag
>>> > is all about, move the server against all the built in safeties. (If
>>> > the admin needs the safeties she can set force=False and still specify
>>> > the destination host)
>>> >
>>> > I'm open to any suggestions.
>>> >
>>> > Cheers,
>>> > gibi
>>> >
>>> > [0] https://review.openstack.org/#/c/608298/
>>> > [1]
>>> >
>>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
>>> > [2]
>>> >
>>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
>>> > [3]
>>> >
>>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
>>> > [4] https://review.openstack.org/#/c/605785
>>> > [5] https://review.openstack.org/#/c/606111
>>> >
>>> >
>>> >
>>> __________________________________________________________________________
>>> > OpenStack Development Mailing List (not for usage questions)
>>> > Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181009/ccf0e6b5/attachment-0001.html>
More information about the OpenStack-dev
mailing list