<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 30, 2018 at 1:06 PM, Balázs Gibizer <span dir="ltr"><<a href="mailto:balazs.gibizer@ericsson.com" target="_blank">balazs.gibizer@ericsson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
<br>
On Tue, May 29, 2018 at 3:12 PM, Sylvain Bauza <<a href="mailto:sbauza@redhat.com" target="_blank">sbauza@redhat.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer <<a href="mailto:balazs.gibizer@ericsson.com" target="_blank">balazs.gibizer@ericsson.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza <<a href="mailto:sbauza@redhat.com" target="_blank">sbauza@redhat.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
Le mar. 29 mai 2018 à 11:02, Balázs Gibizer <<a href="mailto:balazs.gibizer@ericsson.com" target="_blank">balazs.gibizer@ericsson.com</a>> a écrit :<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza <<a href="mailto:sbauza@redhat.com" target="_blank">sbauza@redhat.com</a>><br>
wrote:<br>
><br>
><br>
> On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA<br>
> <<a href="mailto:nakamura.tetsuro@lab.ntt.co.jp" target="_blank">nakamura.tetsuro@lab.ntt.co.j<wbr>p</a>> wrote<br>
><br>
>> > In that situation, say for example with VGPU inventories, that<br>
>> would mean<br>
>> > that the compute node would stop reporting inventories for its<br>
>> root RP, but<br>
>> > would rather report inventories for at least one single child RP.<br>
>> > In that model, do we reconcile the allocations that were already<br>
>> made<br>
>> > against the "root RP" inventory ?<br>
>><br>
>> It would be nice to see Eric and Jay comment on this,<br>
>> but if I'm not mistaken, when the virt driver stops reporting<br>
>> inventories for its root RP, placement would try to delete that<br>
>> inventory inside and raise InventoryInUse exception if any<br>
>> allocations still exist on that resource.<br>
>><br>
>> ```<br>
>> update_from_provider_tree() (nova/compute/resource_tracker<wbr>.py)<br>
>> + _set_inventory_for_provider() (nova/scheduler/client/report.<wbr>py)<br>
>> + put() - PUT /resource_providers/<rp_uuid>/<wbr>inventories with<br>
>> new inventories (scheduler/client/report.py)<br>
>> + set_inventories() (placement/handler/inventory.p<wbr>y)<br>
>> + _set_inventory()<br>
>> (placement/objects/resource_pr<wbr>oveider.py)<br>
>> + _delete_inventory_from_provide<wbr>r()<br>
>> (placement/objects/resource_pr<wbr>oveider.py)<br>
>> -> raise exception.InventoryInUse<br>
>> ```<br>
>><br>
>> So we need some trick something like deleting VGPU allocations<br>
>> before upgrading and set the allocation again for the created new<br>
>> child after upgrading?<br>
>><br>
><br>
> I wonder if we should keep the existing inventory in the root RP, and<br>
> somehow just reserve the left resources (so Placement wouldn't pass<br>
> that root RP for queries, but would still have allocations). But<br>
> then, where and how to do this ? By the resource tracker ?<br>
><br>
<br>
AFAIK it is the virt driver that decides to model the VGU resource at a<br>
different place in the RP tree so I think it is the responsibility of<br>
the same virt driver to move any existing allocation from the old place<br>
to the new place during this change.<br>
<br>
Cheers,<br>
gibi<br>
</blockquote>
<br>
Why not instead not move the allocation but rather have the virt driver updating the root RP by modifying the reserved value to the total size?<br>
<br>
That way, the virt driver wouldn't need to ask for an allocation but rather continue to provide inventories...<br>
<br>
Thoughts?<br>
</blockquote>
<br>
Keeping the old allocaton at the old RP and adding a similar sized reservation in the new RP feels hackis as those are not really reserved GPUs but used GPUs just from the old RP. If somebody sums up the total reported GPUs in this setup via the placement API then she will get more GPUs in total that what is physically visible for the hypervisor as the GPUs part of the old allocation reported twice in two different total value. Could we just report less GPU inventories to the new RP until the old RP has GPU allocations?<br>
<br>
</blockquote>
<br>
<br>
We could keep the old inventory in the root RP for the previous vGPU type already supported in Queens and just add other inventories for other vGPU types now supported. That looks possibly the simpliest option as the virt driver knows that.<br>
</blockquote>
<br></div></div>
That works for me. Can we somehow deprecate the previous, already supported vGPU types to eventually get rid of the splitted inventory?<span class="gmail-"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Some alternatives from my jetlagged brain:<br>
<br>
a) Implement a move inventory/allocation API in placement. Given a resource class and a source RP uuid and a destination RP uuid placement moves the inventory and allocations of that resource class from the source RP to the destination RP. Then the virt drive can call this API to move the allocation. This has an impact on the fast forward upgrade as it needs running virt driver to do the allocation move.<br>
<br>
</blockquote>
<br>
Instead of having the virt driver doing that (TBH, I don't like that given both Xen and libvirt drivers have the same problem), we could write a nova-manage upgrade call for that that would call the Placement API, sure.<br>
</blockquote>
<br></span>
The nova-manage is another possible way similar to my idea #c) but there I imagined the logic in placement-manage instead of nova-manage.<span class="gmail-"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
b) For this I assume that live migrating an instance having a GPU allocation on the old RP will allocate GPU for that instance from the new RP. In the virt driver do not report GPUs to the new RP while there is allocation for such GPUs in the old RP. Let the deployer live migrate away the instances. When the virt driver detects that there is no more GPU allocations on the old RP it can delete the inventory from the old RP and report it to the new RP.<br>
<br>
</blockquote>
<br>
For the moment, vGPUs don't support live migration, even within QEMU. I haven't checked that, but IIUC when you live-migrate an instance that have vGPUs, it will just migrate it without recreating the vGPUs.<br>
</blockquote>
<br></span>
If there is no live migration support for vGPUs then this option can be ignored.<span class="gmail-"><br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Now, the problem is with the VGPU allocation, we should delete it then. Maybe a new bug report ?<br>
</blockquote>
<br></span>
Sounds like a bug report to me :)<span class="gmail-"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
c) For this I assume that there is no support for live migration of an instance having a GPU. If there is GPU allocation in the old RP then virt driver does not report GPU inventory to the new RP just creates the new nested RPs. Provide a placement-manage command to do the inventory + allocation copy from the old RP to the new RP.<br>
<br>
</blockquote>
<br>
what's the difference with the first alternative ?<br>
</blockquote>
<br></span>
I think after you mentioned nova-manage for the first alternative the difference became only doing it from nova-manage or from placement-manage. The placement-manage solution has the benefit of being a pure DB operation, moving inventory and allocation between two RPs while nova-manage would need to call a new placement API.<span class="gmail-"><br>
<br></span></blockquote><div><br><br></div><div>After considering the whole approach, discussing with a couple of folks over IRC, here is what I feel the best approach for a seamless upgrade :<br></div><div> - VGPU inventory will be kept on root RP (for the first type) in Queens so that a compute service upgrade won't impact the DB<br></div><div> - during Queens, operators can run a DB online migration script (like the ones we currently have in <a href="https://github.com/openstack/nova/blob/c2f42b0/nova/cmd/manage.py#L375">https://github.com/openstack/nova/blob/c2f42b0/nova/cmd/manage.py#L375</a>) that will create a new resource provider for the first type and move the inventory and allocations to it.<br></div><div> - it's the responsibility of the virt driver code to check whether a child RP with its name being the first type name already exists to know whether to update the inventory against the root RP or the child RP.<br><br></div><div>Does it work for folks ?<br></div><div>PS : we already have the plumbing in place in nova-manage and we're still managing full Nova resources. I know we plan to move Placement out of the nova tree, but for the Rocky timeframe, I feel we can consider nova-manage as the best and quickiest approach for the data upgrade.<br><br></div><div>-Sylvain<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Anyway, looks like it's pretty simple to just keep the inventory for the already existing vGPU type in the root RP, and just add nested RPs for other vGPU types.<br>
Oh, and btw. we could possibly have the same problem when we implement the NUMA spec that I need to rework <a href="https://review.openstack.org/#/c/552924/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/552924/</a><br>
</blockquote>
<br></span>
If we want to move the VCPU resources from the root to the nested NUMA RP then yes, that feels like the same problem.<br>
<br>
gibi<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
-Sylvain<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Cheers,<br>
gibi<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> -Sylvain<br>
><br>
<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</blockquote></blockquote>
<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</blockquote>
<br>
</blockquote>
<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</div></div></blockquote></div><br></div></div>