<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">Le mer. 10 oct. 2018 à 12:32, Balázs Gibizer <<a href="mailto:balazs.gibizer@ericsson.com">balazs.gibizer@ericsson.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
Thanks for all the feedback. I feel the following consensus is forming:<br>
<br>
1) remove the force flag in a new microversion. I've proposed a spec <br>
about that API change [1]<br>
<br></blockquote><div>Thanks, will look at it.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2) in the old microversions change the blind allocation copy to gather <br>
every resource from a nested source RPs too and try to allocate that <br>
from the destination root RP. In nested allocation cases putting this <br>
allocation to placement will fail and nova will fail the migration / <br>
evacuation. However it will succeed if the server does not need nested <br>
allocation neither on the source nor on the destination host (a.k.a the <br>
legacy case). Or if the server has nested allocation on the source host <br>
but does not need nested allocation on the destination host (for <br>
example the dest host does not have nested RP tree yet).<br>
<br></blockquote><div><br></div><div>Cool with me.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I will start implementing #2) as part of the <br>
use-nested-allocation-candidate bp soon and will continue with #1) <br>
later in the cycle.<br>
<br>
Nothing is set in stone yet so feedback is still very appreciated.<br>
<br>
Cheers,<br>
gibi<br>
<br>
[1] <a href="https://review.openstack.org/#/c/609330/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/609330/</a><br>
<br>
On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer <br>
<<a href="mailto:balazs.gibizer@ericsson.com" target="_blank">balazs.gibizer@ericsson.com</a>> wrote:<br>
> Hi,<br>
> <br>
> Setup<br>
> -----<br>
> <br>
> nested allocation: an allocation that contains resources from one or <br>
> more nested RPs. (if you have better term for this then please <br>
> suggest).<br>
> <br>
> If an instance has nested allocation it means that the compute, it <br>
> allocates from, has a nested RP tree. BUT if a compute has a nested <br>
> RP tree it does not automatically means that the instance, allocating <br>
> from that compute, has a nested allocation (e.g. bandwidth inventory <br>
> will be on a nested RPs but not every instance will require bandwidth)<br>
> <br>
> Afaiu, as soon as we have NUMA modelling in place the most trivial <br>
> servers will have nested allocations as CPU and MEMORY inverntory <br>
> will be moved to the nested NUMA RPs. But NUMA is still in the future.<br>
> <br>
> Sidenote: there is an edge case reported by bauzas when an instance <br>
> allocates _only_ from nested RPs. This was discussed on last Friday <br>
> and it resulted in a new patch[0] but I would like to keep that <br>
> discussion separate from this if possible.<br>
> <br>
> Sidenote: the current problem somewhat related to not just nested PRs <br>
> but to sharing RPs as well. However I'm not aiming to implement <br>
> sharing support in Nova right now so I also try to keep the sharing <br>
> disscussion separated if possible.<br>
> <br>
> There was already some discussion on the Monday's scheduler meeting <br>
> but I could not attend.<br>
> <a href="http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20" rel="noreferrer" target="_blank">http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20</a><br>
> <br>
> <br>
> The meat<br>
> --------<br>
> <br>
> Both live-migrate[1] and evacuate[2] has an optional force flag on <br>
> the nova REST API. The documentation says: "Force <the action> by not <br>
> verifying the provided destination host by the scheduler."<br>
> <br>
> Nova implements this statement by not calling the scheduler if <br>
> force=True BUT still try to manage allocations in placement.<br>
> <br>
> To have allocation on the destination host Nova blindly copies the <br>
> instance allocation from the source host to the destination host <br>
> during these operations. Nova can do that as 1) the whole allocation <br>
> is against a single RP (the compute RP) and 2) Nova knows both the <br>
> source compute RP and the destination compute RP.<br>
> <br>
> However as soon as we bring nested allocations into the picture that <br>
> blind copy will not be feasible. Possible cases<br>
> 0) The instance has non-nested allocation on the source and would <br>
> need non nested allocation on the destination. This works with blindy <br>
> copy today.<br>
> 1) The instance has a nested allocation on the source and would need <br>
> a nested allocation on the destination as well.<br>
> 2) The instance has a non-nested allocation on the source and would <br>
> need a nested allocation on the destination.<br>
> 3) The instance has a nested allocation on the source and would need <br>
> a non nested allocation on the destination.<br>
> <br>
> Nova cannot generate nested allocations easily without reimplementing <br>
> some of the placement allocation candidate (a_c) code. However I <br>
> don't like the idea of duplicating some of the a_c code in Nova.<br>
> <br>
> Nova cannot detect what kind of allocation (nested or non-nested) an <br>
> instance would need on the destination without calling placement a_c. <br>
> So knowing when to call placement is a chicken and egg problem.<br>
> <br>
> Possible solutions:<br>
> A) fail fast<br>
> ------------<br>
> 0) Nova can detect that the source allocatioin is non-nested and try <br>
> the blindy copy and it will succeed.<br>
> 1) Nova can detect that the source allocaton is nested and fail the <br>
> operation<br>
> 2) Nova only sees a non nested source allocation. Even if the dest RP <br>
> tree is nested it does not mean that the allocation will be nested. <br>
> We cannot fail fast. Nova can try the blind copy and allocate every <br>
> resources from the root RP of the destination. If the instance <br>
> require nested allocation instead the claim will fail in placement. <br>
> So nova can fail the operation a bit later than in 1).<br>
> 3) Nova can detect that the source allocation is nested and fail the <br>
> operation. However and enhanced blind copy that tries to allocation <br>
> everything from the root RP on the destinaton would have worked.<br>
> <br>
> B) Guess when to ignore the force flag and call the scheduler<br>
> -------------------------------------------------------------<br>
> 0) keep the blind copy as it works<br>
> 1) Nova detect that the source allocation is nested. Ignores the <br>
> force flag and calls the scheduler that will call placement a_c. Move <br>
> operation can succeed.<br>
> 2) Nova only sees a non nested source allocation so it will fall back <br>
> to blind copy and fails at the claim on destination.<br>
> 3) Nova detect that the source allocation is nested. Ignores the <br>
> force flag and calls the scheduler that will call placement a_c. Move <br>
> operation can succeed.<br>
> <br>
> This solution would be against the API doc that states nova does not <br>
> call the scheduler if the operation is forced. However in case of <br>
> force live-migration Nova already verifies the target host from <br>
> couple of perspective in [3].<br>
> This solution is alreay proposed for live-migrate in [4] and for <br>
> evacuate in [5] so the complexity of the solution can be seen in the <br>
> reviews.<br>
> <br>
> C) Remove the force flag from the API in a new microversion<br>
> -----------------------------------------------------------<br>
> 0)-3): all cases would call the scheduler to verify the target host <br>
> and generate the nested (or non-nested) allocation.<br>
> We would still need an agreed behavior (from A), B), D)) for the old <br>
> microversions as the todays code creates inconsistent allocation in <br>
> #1) and #3) by ignoring the resource from the nested RP.<br>
> <br>
> D) Do not manage allocations in placement for forced operation<br>
> --------------------------------------------------------------<br>
> Force flag is considered as a last resort tool for the admin to move <br>
> VMs around. The API doc has a fat warning about the danger of it. So <br>
> Nova can simply ignore resource allocation task if force=True. Nova <br>
> would delete the source allocation and does not create any allocation <br>
> on the destination host.<br>
> <br>
> This is a simple but dangerous solution but it is what the force flag <br>
> is all about, move the server against all the built in safeties. (If <br>
> the admin needs the safeties she can set force=False and still <br>
> specify the destination host)<br>
> <br>
> I'm open to any suggestions.<br>
> <br>
> Cheers,<br>
> gibi<br>
> <br>
> [0] <a href="https://review.openstack.org/#/c/608298/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/608298/</a><br>
> [1] <br>
> <a href="https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action</a><br>
> [2] <br>
> <a href="https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action" rel="noreferrer" target="_blank">https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action</a><br>
> [3] <br>
> <a href="https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97</a><br>
> [4] <a href="https://review.openstack.org/#/c/605785" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/605785</a><br>
> [5] <a href="https://review.openstack.org/#/c/606111" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/606111</a><br>
> <br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote></div></div>