<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 31, 2018 at 5:00 PM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/31/2018 05:10 AM, Sylvain Bauza wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

After considering the whole approach, discussing with a couple of folks over IRC, here is what I feel the best approach for a seamless upgrade :<br>

  - VGPU inventory will be kept on root RP (for the first type) in Queens so that a compute service upgrade won't impact the DB<br>

  - during Queens, operators can run a DB online migration script (like the ones we currently have in <a href="https://github.com/openstack/nova/blob/c2f42b0/nova/cmd/manage.py#L375" rel="noreferrer" target="_blank">https://github.com/openstack/n<wbr>ova/blob/c2f42b0/nova/cmd/mana<wbr>ge.py#L375</a>) that will create a new resource provider for the first type and move the inventory and allocations to it.<br>

  - it's the responsibility of the virt driver code to check whether a child RP with its name being the first type name already exists to know whether to update the inventory against the root RP or the child RP.<br>

<br>

Does it work for folks ?<br>

</blockquote>

<br></span>

No, sorry, that doesn't work for me. It seems overly complex and fragile, especially considering that VGPUs are not moveable anyway (no support for live migrating them). Same goes for CPU pinning, NUMA topologies, PCI passthrough devices, SR-IOV PF/VFs and all the other "must have" features that have been added to the virt driver over the last 5 years.<br>

<br>

My feeling is that we should not attempt to "migrate" any allocations or inventories between root or child providers within a compute node, period.<br>

<br></blockquote><div><br></div><div>I don't understand why you're talking of *moving* an instance. My concern was about upgrading a compute node to Rocky where some instances were already there, and using vGPUs.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The virt drivers should simply error out of update_provider_tree() if there are ANY existing VMs on the host AND the virt driver wishes to begin tracking resources with nested providers.<br>

<br>

The upgrade operation should look like this:<br>

<br>

1) Upgrade placement<br>

2) Upgrade nova-scheduler<br>

3) start loop on compute nodes. for each compute node:<br>

 3a) disable nova-compute service on node (to take it out of scheduling)<br>

 3b) evacuate all existing VMs off of node<br>

 3c) upgrade compute node (on restart, the compute node will see no<br>

     VMs running on the node and will construct the provider tree inside<br>

     update_provider_tree() with an appropriate set of child providers<br>

     and inventories on those child providers)<br>

 3d) enable nova-compute service on node<br>

<br>

Which is virtually identical to the "normal" upgrade process whenever there are significant changes to the compute node -- such as upgrading libvirt or the kernel. Nested resource tracking is another such significant change and should be dealt with in a similar way, IMHO.<br>

<br></blockquote><div><br></div><div>Upgrading to Rocky for vGPUs doesn't need to also upgrade libvirt or the kernel. So why operators should need to "evacuate" (I understood that as "migrate")  instances if they don't need to upgrade their host OS ?<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Best,<br>

-jay<div class="HOEnZb"><div class="h5"><br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

</div></div></blockquote></div><br></div></div>