<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">Le mar. 9 oct. 2018 à 16:39, Eric Fried <openstack@fried.cc> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">IIUC, the primary thing the force flag was intended to do - allow an<br>
instance to land on the requested destination even if that means<br>
oversubscription of the host's resources - doesn't happen anymore since<br>
we started making the destination claim in placement.<br>
<br>
IOW, since pike, you don't actually see a difference in behavior by<br>
using the force flag or not. (If you do, it's more likely a bug than<br>
what you were expecting.)<br>
<br>
So there's no reason to keep it around. We can remove it in a new<br>
microversion (or not); but even in the current microversion we need not<br>
continue making convoluted attempts to observe it.<br>
<br>
What that means is that we should simplify everything down to ignore the<br>
force flag and always call GET /a_c. Problem solved - for nested and/or<br>
sharing, NUMA or not, root resources or no, on the source and/or<br>
destination.<br>
<br></blockquote><div><br></div><div><br></div><div>While I tend to agree with Eric here (and I commented on the review accordingly by saying we should signal the new behaviour by a microversion), I still think we need to properly advertise this, adding openstack-operators@ accordingly.</div><div>Disclaimer : since we have gaps on OSC, the current OSC behaviour when you "openstack server live-migrate <target>" is to *force* the destination by not calling the scheduler. Yeah, it sucks.</div><div><br></div><div>Operators, what are the exact cases (for those running clouds newer than Mitaka, ie. Newton and above) when you make use of the --force option for live migration with a microversion newer or equal 2.29 ?</div><div>In general, even in the case of an emergency, you still want to make sure you don't throw your compute under the bus by massively migrating instances that would create an undetected snowball effect by having this compute refusing new instances. Or are you disabling the target compute service first and throw your pet instances up there ?</div><div><br></div><div>-Sylvain</div><div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
-efried<br>
<br>
On 10/09/2018 04:40 AM, Balázs Gibizer wrote:<br>
> Hi,<br>
> <br>
> Setup<br>
> -----<br>
> <br>
> nested allocation: an allocation that contains resources from one or <br>
> more nested RPs. (if you have better term for this then please suggest).<br>
> <br>
> If an instance has nested allocation it means that the compute, it <br>
> allocates from, has a nested RP tree. BUT if a compute has a nested RP <br>
> tree it does not automatically means that the instance, allocating from <br>
> that compute, has a nested allocation (e.g. bandwidth inventory will be <br>
> on a nested RPs but not every instance will require bandwidth)<br>
> <br>
> Afaiu, as soon as we have NUMA modelling in place the most trivial <br>
> servers will have nested allocations as CPU and MEMORY inverntory will <br>
> be moved to the nested NUMA RPs. But NUMA is still in the future.<br>
> <br>
> Sidenote: there is an edge case reported by bauzas when an instance <br>
> allocates _only_ from nested RPs. This was discussed on last Friday and <br>
> it resulted in a new patch[0] but I would like to keep that discussion <br>
> separate from this if possible.<br>
> <br>
> Sidenote: the current problem somewhat related to not just nested PRs <br>
> but to sharing RPs as well. However I'm not aiming to implement sharing <br>
> support in Nova right now so I also try to keep the sharing disscussion <br>
> separated if possible.<br>
> <br>
> There was already some discussion on the Monday's scheduler meeting but <br>
> I could not attend.<br>
> <a href="http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20" rel="noreferrer" target="_blank">http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20</a><br>
> <br>
> <br>
> The meat<br>
> --------<br>
> <br>
> Both live-migrate[1] and evacuate[2] has an optional force flag on the <br>
> nova REST API. The documentation says: "Force <the action> by not <br>
> verifying the provided destination host by the scheduler."<br>
> <br>
> Nova implements this statement by not calling the scheduler if <br>
> force=True BUT still try to manage allocations in placement.<br>
> <br>
> To have allocation on the destination host Nova blindly copies the <br>
> instance allocation from the source host to the destination host during <br>
> these operations. Nova can do that as 1) the whole allocation is <br>
> against a single RP (the compute RP) and 2) Nova knows both the source <br>
> compute RP and the destination compute RP.<br>
> <br>
> However as soon as we bring nested allocations into the picture that <br>
> blind copy will not be feasible. Possible cases<br>
> 0) The instance has non-nested allocation on the source and would need <br>
> non nested allocation on the destination. This works with blindy copy <br>
> today.<br>
> 1) The instance has a nested allocation on the source and would need a <br>
> nested allocation on the destination as well.<br>
> 2) The instance has a non-nested allocation on the source and would <br>
> need a nested allocation on the destination.<br>
> 3) The instance has a nested allocation on the source and would need a <br>
> non nested allocation on the destination.<br>
> <br>
> Nova cannot generate nested allocations easily without reimplementing <br>
> some of the placement allocation candidate (a_c) code. However I don't <br>
> like the idea of duplicating some of the a_c code in Nova.<br>
> <br>
> Nova cannot detect what kind of allocation (nested or non-nested) an <br>
> instance would need on the destination without calling placement a_c. <br>
> So knowing when to call placement is a chicken and egg problem.<br>
> <br>
> Possible solutions:<br>
> A) fail fast<br>
> ------------<br>
> 0) Nova can detect that the source allocatioin is non-nested and try <br>
> the blindy copy and it will succeed.<br>
> 1) Nova can detect that the source allocaton is nested and fail the <br>
> operation<br>
> 2) Nova only sees a non nested source allocation. Even if the dest RP <br>
> tree is nested it does not mean that the allocation will be nested. We <br>
> cannot fail fast. Nova can try the blind copy and allocate every <br>
> resources from the root RP of the destination. If the instance require <br>
> nested allocation instead the claim will fail in placement. So nova can <br>
> fail the operation a bit later than in 1).<br>
> 3) Nova can detect that the source allocation is nested and fail the <br>
> operation. However and enhanced blind copy that tries to allocation <br>
> everything from the root RP on the destinaton would have worked.<br>
> <br>
> B) Guess when to ignore the force flag and call the scheduler<br>
> -------------------------------------------------------------<br>
> 0) keep the blind copy as it works<br>
> 1) Nova detect that the source allocation is nested. Ignores the force <br>
> flag and calls the scheduler that will call placement a_c. Move <br>
> operation can succeed.<br>
> 2) Nova only sees a non nested source allocation so it will fall back <br>
> to blind copy and fails at the claim on destination.<br>
> 3) Nova detect that the source allocation is nested. Ignores the force <br>
> flag and calls the scheduler that will call placement a_c. Move <br>
> operation can succeed.<br>
> <br>
> This solution would be against the API doc that states nova does not <br>
> call the scheduler if the operation is forced. However in case of force <br>
> live-migration Nova already verifies the target host from couple of <br>
> perspective in [3].<br>
> This solution is alreay proposed for live-migrate in [4] and for <br>
> evacuate in [5] so the complexity of the solution can be seen in the <br>
> reviews.<br>
> <br>
> C) Remove the force flag from the API in a new microversion<br>
> -----------------------------------------------------------<br>
> 0)-3): all cases would call the scheduler to verify the target host and <br>
> generate the nested (or non-nested) allocation.<br>
> We would still need an agreed behavior (from A), B), D)) for the old <br>
> microversions as the todays code creates inconsistent allocation in #1) <br>
> and #3) by ignoring the resource from the nested RP.<br>
> <br>
> D) Do not manage allocations in placement for forced operation<br>
> --------------------------------------------------------------<br>
> Force flag is considered as a last resort tool for the admin to move <br>
> VMs around. The API doc has a fat warning about the danger of it. So <br>
> Nova can simply ignore resource allocation task if force=True. Nova <br>
> would delete the source allocation and does not create any allocation <br>
> on the destination host.<br>
> <br>
> This is a simple but dangerous solution but it is what the force flag <br>
> is all about, move the server against all the built in safeties. (If <br>
> the admin needs the safeties she can set force=False and still specify <br>
> the destination host)<br>
> <br>
> I'm open to any suggestions.<br>
> <br>
> Cheers,<br>
> gibi<br>
> <br>
> [0] <a href="https://review.openstack.org/#/c/608298/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/608298/</a><br>
> [1] <br>
> <a href="https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action</a><br>
> [2] <br>
> <a href="https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action</a><br>
> [3] <br>
> <a href="https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97</a><br>
> [4] <a href="https://review.openstack.org/#/c/605785" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/605785</a><br>
> [5] <a href="https://review.openstack.org/#/c/606111" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/606111</a><br>
> <br>
> <br>
> __________________________________________________________________________<br>
> OpenStack Development Mailing List (not for usage questions)<br>
> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> <br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote></div></div>