11 Nov
2024
11 Nov
'24
1:13 p.m.
Hello Mickael, There was a mailing thread about mdevs and recreating them a while back [1]. We are not yet on Dalmatian for Nova so we have not tested that feature but are using a similar script to what is linked in that thread. /Tobias [1] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/A5G2IGCWGEM6BEQU52UMA45WISTBU2AA/ On Fri, Nov 08, 2024 at 11:18:50AM UTC, melanie witt wrote: > On 11/8/24 02:04, Mickael Razzouk wrote: > > Hello, I am tying a few things with vGPU on nova and I struggle with the mdev management Nova does. > > > > My environment: > > - Nvidia A2 pGPU > > - openstack caracal > > > > What I did : > > - installed the nvidia grid host driver on the compute > > - enabled the virtual function of the pGPU > > - added in nova.conf the mdev type wanted and the pci address of the virtual functions > > - restarted nova > > - created a flavor with the property resources:VGPU=1 > > > > (basicaly followed this documentation : https://docs.openstack.org/nova/latest/admin/virtual-gpu.html) > > > > the only diference is that I did not put the pGPU pci address in the nova.conf but the address of its virtual functions, so in my case 0000:41:00.4 and not 0000:41:00.0 (the RP were not detected otherwise) > > > > > From my understanding of this specs sheet (https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html) and some experimentation, > > When a vgpu is requested by the end user : > > - Nova looks at the mdevs already running and use them if available. > > - If there is no mdev available, Nova looks at the RP tree to find mdev capable pci devices that have the specific mdev type available. > > - If there is such pci device available, Nova create the mdev with a UUID and create the domain XML with this UUID inside. > > - If no such device is available, Nova returns no host available as an error. > > > > My questions are : > > > > - Mdev are not persistent across reboot, but when it happens, nova crash at boot because the mdevs are missing, just recreating mdev does not fix the issue as they need to have the same UUID. > > One could fix the issue by making mdev persistent using a tool like mdevctl, but by creating the mdev the first time I seem to me that Nova tries to manage mdev, is it normal for it to require manual intervention afterward ? > > Or am I missing something ? > > I can't answer both of your questions but I can say that for 2024.1 > (Caracal), no you are not missing something -- mdevs are not persisted > across reboot. See https://bugs.launchpad.net/nova/+bug/1900800 for details. > > However in 2024.2 (Dalmatian) we added support for persistent mdevs (note > this requires libvirt >= 7.3.0): > > https://docs.openstack.org/releasenotes/nova/2024.2.html > > HTH, > -melwitt > > > - In the nova configuration, is it normal to diverge from the documentation and enter the VF PCI addresses to detect the RF ? > > > > Thank you in advance for your help. >