<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer <<a href="mailto:balazs.gibizer@ericsson.com">balazs.gibizer@ericsson.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza <<a href="mailto:sylvain.bauza@gmail.com" target="_blank">sylvain.bauza@gmail.com</a>> <br>

wrote:<br>

> <br>

> <br>

> Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack@fried.cc> a <br>

> écrit :<br>

>> IIUC, the primary thing the force flag was intended to do - allow an<br>

>> instance to land on the requested destination even if that means<br>

>> oversubscription of the host's resources - doesn't happen anymore <br>

>> since<br>

>> we started making the destination claim in placement.<br>

>> <br>

>> IOW, since pike, you don't actually see a difference in behavior by<br>

>> using the force flag or not. (If you do, it's more likely a bug than<br>

>> what you were expecting.)<br>

>> <br>

>> So there's no reason to keep it around. We can remove it in a new<br>

>> microversion (or not); but even in the current microversion we need <br>

>> not<br>

>> continue making convoluted attempts to observe it.<br>

>> <br>

>> What that means is that we should simplify everything down to ignore <br>

>> the<br>

>> force flag and always call GET /a_c. Problem solved - for nested <br>

>> and/or<br>

>> sharing, NUMA or not, root resources or no, on the source and/or<br>

>> destination.<br>

>> <br>

> <br>

> <br>

> While I tend to agree with Eric here (and I commented on the review <br>

> accordingly by saying we should signal the new behaviour by a <br>

> microversion), I still think we need to properly advertise this, <br>

> adding openstack-operators@ accordingly.<br>

<br>

Question for you as well: if we remove (or change) the force flag in a <br>

new microversion then how should the old microversions behave when <br>

nested allocations would be required?<br>

<br></blockquote><div><br></div><div>In that case (ie. old microversions with either "force=None and target" or 'force=True', we should IMHO not allocate any migration.</div><div>Thoughts ?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Cheers,<br>

gibi<br>

<br>

> Disclaimer : since we have gaps on OSC, the current OSC behaviour <br>

> when you "openstack server live-migrate <target>" is to *force* the <br>

> destination by not calling the scheduler. Yeah, it sucks.<br>

> <br>

> Operators, what are the exact cases (for those running clouds newer <br>

> than Mitaka, ie. Newton and above) when you make use of the --force <br>

> option for live migration with a microversion newer or equal 2.29 ?<br>

> In general, even in the case of an emergency, you still want to make <br>

> sure you don't throw your compute under the bus by massively <br>

> migrating instances that would create an undetected snowball effect <br>

> by having this compute refusing new instances. Or are you disabling <br>

> the target compute service first and throw your pet instances up <br>

> there ?<br>

> <br>

> -Sylvain<br>

> <br>

> <br>

> <br>

>> -efried<br>

>> <br>

>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:<br>

>> > Hi,<br>

>> ><br>

>> > Setup<br>

>> > -----<br>

>> ><br>

>> > nested allocation: an allocation that contains resources from one <br>

>> or<br>

>> > more nested RPs. (if you have better term for this then please <br>

>> suggest).<br>

>> ><br>

>> > If an instance has nested allocation it means that the compute, it<br>

>> > allocates from, has a nested RP tree. BUT if a compute has a <br>

>> nested RP<br>

>> > tree it does not automatically means that the instance, allocating <br>

>> from<br>

>> > that compute, has a nested allocation (e.g. bandwidth inventory <br>

>> will be<br>

>> > on a nested RPs but not every instance will require bandwidth)<br>

>> ><br>

>> > Afaiu, as soon as we have NUMA modelling in place the most trivial<br>

>> > servers will have nested allocations as CPU and MEMORY inverntory <br>

>> will<br>

>> > be moved to the nested NUMA RPs. But NUMA is still in the future.<br>

>> ><br>

>> > Sidenote: there is an edge case reported by bauzas when an instance<br>

>> > allocates _only_ from nested RPs. This was discussed on last <br>

>> Friday and<br>

>> > it resulted in a new patch[0] but I would like to keep that <br>

>> discussion<br>

>> > separate from this if possible.<br>

>> ><br>

>> > Sidenote: the current problem somewhat related to not just nested <br>

>> PRs<br>

>> > but to sharing RPs as well. However I'm not aiming to implement <br>

>> sharing<br>

>> > support in Nova right now so I also try to keep the sharing <br>

>> disscussion<br>

>> > separated if possible.<br>

>> ><br>

>> > There was already some discussion on the Monday's scheduler <br>

>> meeting but<br>

>> > I could not attend.<br>

>> > <br>

>> <a href="http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20" rel="noreferrer" target="_blank">http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20</a><br>

>> ><br>

>> ><br>

>> > The meat<br>

>> > --------<br>

>> ><br>

>> > Both live-migrate[1] and evacuate[2] has an optional force flag on <br>

>> the<br>

>> > nova REST API. The documentation says: "Force <the action> by not<br>

>> > verifying the provided destination host by the scheduler."<br>

>> ><br>

>> > Nova implements this statement by not calling the scheduler if<br>

>> > force=True BUT still try to manage allocations in placement.<br>

>> ><br>

>> > To have allocation on the destination host Nova blindly copies the<br>

>> > instance allocation from the source host to the destination host <br>

>> during<br>

>> > these operations. Nova can do that as 1) the whole allocation is<br>

>> > against a single RP (the compute RP) and 2) Nova knows both the <br>

>> source<br>

>> > compute RP and the destination compute RP.<br>

>> ><br>

>> > However as soon as we bring nested allocations into the picture <br>

>> that<br>

>> > blind copy will not be feasible. Possible cases<br>

>> > 0) The instance has non-nested allocation on the source and would <br>

>> need<br>

>> > non nested allocation on the destination. This works with blindy <br>

>> copy<br>

>> > today.<br>

>> > 1) The instance has a nested allocation on the source and would <br>

>> need a<br>

>> > nested allocation on the destination as well.<br>

>> > 2) The instance has a non-nested allocation on the source and would<br>

>> > need a nested allocation on the destination.<br>

>> > 3) The instance has a nested allocation on the source and would <br>

>> need a<br>

>> > non nested allocation on the destination.<br>

>> ><br>

>> > Nova cannot generate nested allocations easily without <br>

>> reimplementing<br>

>> > some of the placement allocation candidate (a_c) code. However I <br>

>> don't<br>

>> > like the idea of duplicating some of the a_c code in Nova.<br>

>> ><br>

>> > Nova cannot detect what kind of allocation (nested or non-nested) <br>

>> an<br>

>> > instance would need on the destination without calling placement <br>

>> a_c.<br>

>> > So knowing when to call placement is a chicken and egg problem.<br>

>> ><br>

>> > Possible solutions:<br>

>> > A) fail fast<br>

>> > ------------<br>

>> > 0) Nova can detect that the source allocatioin is non-nested and <br>

>> try<br>

>> > the blindy copy and it will succeed.<br>

>> > 1) Nova can detect that the source allocaton is nested and fail the<br>

>> > operation<br>

>> > 2) Nova only sees a non nested source allocation. Even if the dest <br>

>> RP<br>

>> > tree is nested it does not mean that the allocation will be <br>

>> nested. We<br>

>> > cannot fail fast. Nova can try the blind copy and allocate every<br>

>> > resources from the root RP of the destination. If the instance <br>

>> require<br>

>> > nested allocation instead the claim will fail in placement. So <br>

>> nova can<br>

>> > fail the operation a bit later than in 1).<br>

>> > 3) Nova can detect that the source allocation is nested and fail <br>

>> the<br>

>> > operation. However and enhanced blind copy that tries to allocation<br>

>> > everything from the root RP on the destinaton would have worked.<br>

>> ><br>

>> > B) Guess when to ignore the force flag and call the scheduler<br>

>> > -------------------------------------------------------------<br>

>> > 0) keep the blind copy as it works<br>

>> > 1) Nova detect that the source allocation is nested. Ignores the <br>

>> force<br>

>> > flag and calls the scheduler that will call placement a_c. Move<br>

>> > operation can succeed.<br>

>> > 2) Nova only sees a non nested source allocation so it will fall <br>

>> back<br>

>> > to blind copy and fails at the claim on destination.<br>

>> > 3) Nova detect that the source allocation is nested. Ignores the <br>

>> force<br>

>> > flag and calls the scheduler that will call placement a_c. Move<br>

>> > operation can succeed.<br>

>> ><br>

>> > This solution would be against the API doc that states nova does <br>

>> not<br>

>> > call the scheduler if the operation is forced. However in case of <br>

>> force<br>

>> > live-migration Nova already verifies the target host from couple of<br>

>> > perspective in [3].<br>

>> > This solution is alreay proposed for live-migrate in [4] and for<br>

>> > evacuate in [5] so the complexity of the solution can be seen in <br>

>> the<br>

>> > reviews.<br>

>> ><br>

>> > C) Remove the force flag from the API in a new microversion<br>

>> > -----------------------------------------------------------<br>

>> > 0)-3): all cases would call the scheduler to verify the target <br>

>> host and<br>

>> > generate the nested (or non-nested) allocation.<br>

>> > We would still need an agreed behavior (from A), B), D)) for the <br>

>> old<br>

>> > microversions as the todays code creates inconsistent allocation <br>

>> in #1)<br>

>> > and #3) by ignoring the resource from the nested RP.<br>

>> ><br>

>> > D) Do not manage allocations in placement for forced operation<br>

>> > --------------------------------------------------------------<br>

>> > Force flag is considered as a last resort tool for the admin to <br>

>> move<br>

>> > VMs around. The API doc has a fat warning about the danger of it. <br>

>> So<br>

>> > Nova can simply ignore resource allocation task if force=True. Nova<br>

>> > would delete the source allocation and does not create any <br>

>> allocation<br>

>> > on the destination host.<br>

>> ><br>

>> > This is a simple but dangerous solution but it is what the force <br>

>> flag<br>

>> > is all about, move the server against all the built in safeties. <br>

>> (If<br>

>> > the admin needs the safeties she can set force=False and still <br>

>> specify<br>

>> > the destination host)<br>

>> ><br>

>> > I'm open to any suggestions.<br>

>> ><br>

>> > Cheers,<br>

>> > gibi<br>

>> ><br>

>> > [0] <a href="https://review.openstack.org/#/c/608298/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/608298/</a><br>

>> > [1]<br>

>> > <br>

>> <a href="https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action</a><br>

>> > [2]<br>

>> > <br>

>> <a href="https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action</a><br>

>> > [3]<br>

>> > <br>

>> <a href="https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97</a><br>

>> > [4] <a href="https://review.openstack.org/#/c/605785" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/605785</a><br>

>> > [5] <a href="https://review.openstack.org/#/c/606111" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/606111</a><br>

>> ><br>

>> ><br>

>> > <br>

>> __________________________________________________________________________<br>

>> > OpenStack Development Mailing List (not for usage questions)<br>

>> > Unsubscribe: <br>

>> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

>> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>> ><br>

>> <br>

>> __________________________________________________________________________<br>

>> OpenStack Development Mailing List (not for usage questions)<br>

>> Unsubscribe: <br>

>> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div>