Help with vGPU's using an NV L4 card on U24.04 (6.8)
Hello All, I'm trying to get vGPU's to work with openstack and I believe I am almost there, however I can't seem to get them to register to placement. Here is what I have so far OS: Ubuntu 24.04 Kernel: 6.8.0-62-generic Nvidia GRID Version: 570.158.02-570.158.01-573.39 Openstack Deployment: Kolla w/ Docker Openstack Version: 2025.1 I have the SRIOV devices enabled with the systemd service, and can see the vgpu types; # cat /sys/bus/pci/devices/0000:25:00.0/virtfn0/nvidia/creatable_vgpu_types ID : vGPU Name 908 : NVIDIA L4-1B 909 : NVIDIA L4-2B 910 : NVIDIA L4-1Q 911 : NVIDIA L4-2Q 912 : NVIDIA L4-3Q 913 : NVIDIA L4-4Q 914 : NVIDIA L4-6Q 915 : NVIDIA L4-8Q 916 : NVIDIA L4-12Q 917 : NVIDIA L4-24Q 918 : NVIDIA L4-1A 919 : NVIDIA L4-2A 920 : NVIDIA L4-3A 921 : NVIDIA L4-4A 922 : NVIDIA L4-6A 923 : NVIDIA L4-8A 924 : NVIDIA L4-12A 925 : NVIDIA L4-24A My kolla node nova custom config (for nova-compute) is [DEFAULT] debug = true verbose = true [devices] enabled_mdev_types = nvidia-918 [mdev_nvidia-918] max_instances = 8 [libvirt] live_migration_downtime = 500000 live_migration_downtime_steps = 3 live_migration_downtime_delay = 3 I've also created flavors, classes, and traits with; openstack flavor create l4-1a --ram 8192 --disk 40 --vcpus 4 openstack flavor set l4-1a --property "resources:VGPU=1" --property "trait:CUSTOM_NVIDIA_918=required" openstack resource class create CUSTOM_NVIDIA_918 openstack trait create CUSTOM_NVIDIA_918 However I can't seem to get placement to show any vGPU's or any of the traits I registered: This just has an empty line # openstack allocation candidate list --resource VGPU=1 and this will just show the standard vCPU/Memory/Disk # openstack resource provider inventory list <Host UUID> Have I missed a step somewhere, do I need to prepare the devices further before nova can pick them up? Thanks for any and all help!
You’re missing the PCI device address profiles per the vGPU documentation: As an example (from the documentation) [devices] enabled_mdev_types = nvidia-35, nvidia-36 [mdev_nvidia-35] device_addresses = 0000:84:00.0,0000:85:00.0 [mdev_nvidia-36] device_addresses = 0000:86:00.0 https://docs.openstack.org/nova/latest/admin/virtual-gpu.html Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [cid:reset_69557fc2-1d63-4932-b5fd-93bd4f39ca7b.png] ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf<https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf> This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au> From: Tyler Wilson <tyler@ghosty.pw> Date: Thursday, 26 June 2025 at 1:09 pm To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Help with vGPU's using an NV L4 card on U24.04 (6.8) [You don't often get email from tyler@ghosty.pw. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hello All, I'm trying to get vGPU's to work with openstack and I believe I am almost there, however I can't seem to get them to register to placement. Here is what I have so far OS: Ubuntu 24.04 Kernel: 6.8.0-62-generic Nvidia GRID Version: 570.158.02-570.158.01-573.39 Openstack Deployment: Kolla w/ Docker Openstack Version: 2025.1 I have the SRIOV devices enabled with the systemd service, and can see the vgpu types; # cat /sys/bus/pci/devices/0000:25:00.0/virtfn0/nvidia/creatable_vgpu_types ID : vGPU Name 908 : NVIDIA L4-1B 909 : NVIDIA L4-2B 910 : NVIDIA L4-1Q 911 : NVIDIA L4-2Q 912 : NVIDIA L4-3Q 913 : NVIDIA L4-4Q 914 : NVIDIA L4-6Q 915 : NVIDIA L4-8Q 916 : NVIDIA L4-12Q 917 : NVIDIA L4-24Q 918 : NVIDIA L4-1A 919 : NVIDIA L4-2A 920 : NVIDIA L4-3A 921 : NVIDIA L4-4A 922 : NVIDIA L4-6A 923 : NVIDIA L4-8A 924 : NVIDIA L4-12A 925 : NVIDIA L4-24A My kolla node nova custom config (for nova-compute) is [DEFAULT] debug = true verbose = true [devices] enabled_mdev_types = nvidia-918 [mdev_nvidia-918] max_instances = 8 [libvirt] live_migration_downtime = 500000 live_migration_downtime_steps = 3 live_migration_downtime_delay = 3 I've also created flavors, classes, and traits with; openstack flavor create l4-1a --ram 8192 --disk 40 --vcpus 4 openstack flavor set l4-1a --property "resources:VGPU=1" --property "trait:CUSTOM_NVIDIA_918=required" openstack resource class create CUSTOM_NVIDIA_918 openstack trait create CUSTOM_NVIDIA_918 However I can't seem to get placement to show any vGPU's or any of the traits I registered: This just has an empty line # openstack allocation candidate list --resource VGPU=1 and this will just show the standard vCPU/Memory/Disk # openstack resource provider inventory list <Host UUID> Have I missed a step somewhere, do I need to prepare the devices further before nova can pick them up? Thanks for any and all help!
Hello, Thanks for the reply. I can certainly add the device addresses however further down the same document there is a caveat stating that if the newer kernel variant driver that uses purely SRIOV virtual function devices it's no longer used (I confirmed I am using a card that supports this and using the 6.8 kernel which also supports it). - When using recent nVidia GPU architectures like Ampere or newer GPUs which have SR-IOV feature, Nova can’t know how many vGPUs can be used by a specific type. You then need to create virtual functions and then provide the list of the virtual functions per GPUs that can be used by setting device_addresses. Changed in version 29.0.0: By the 2024.1 Caracal release, if you use those hardware, you need to provide a new configuration option named max_instances in the related mdev type group (eg. mdev_nvidia-35) where the value of that option would be the number of vGPUs that the type can create. As an example for the A40-2Q nVidia GPU type <https://docs.nvidia.com/vgpu/16.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a40> which can create up to 24 vGPUs, please provide the below configuration : [devices]enabled_mdev_types = nvidia-558 [mdev_nvidia-558]max_instances = 24 As a side note, you can see that we don’t use device_addresses in the mdev_nvidia-558 section, as we don’t need to tell which exact virtual functions we want to use for that type. To use device_addresses with these VF's would I map them in a giant list or does it just need the host gpu pci id? Or is there another step I'm missing to account for this newer setup? Thanks! On Wed, Jun 25, 2025 at 8:57 PM Karl Kloppenborg < kkloppenborg@resetdata.com.au> wrote:
You’re missing the PCI device address profiles per the vGPU documentation: As an example (from the documentation)
*[devices]*enabled_mdev_types = nvidia-35, nvidia-36 *[mdev_nvidia-35]*device_addresses = 0000:84:00.0,0000:85:00.0 *[mdev_nvidia-36]*device_addresses = 0000:86:00.0
https://docs.openstack.org/nova/latest/admin/virtual-gpu.html
Karl Kloppenborg
Chief Technology Officer
m: *+61 437 239 565* *resetdata.com <https://resetdata.com/>*
[image: reset.png]
*ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting*For more information on the phasing of these requirements for business please visit; *https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-stat... <https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf>*
This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to *contact@resetdata.com.au <contact@resetdata.com.au>* *From: *Tyler Wilson <tyler@ghosty.pw> *Date: *Thursday, 26 June 2025 at 1:09 pm *To: *openstack-discuss@lists.openstack.org < openstack-discuss@lists.openstack.org> *Subject: *Help with vGPU's using an NV L4 card on U24.04 (6.8)
[You don't often get email from tyler@ghosty.pw. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Hello All,
I'm trying to get vGPU's to work with openstack and I believe I am almost there, however I can't seem to get them to register to placement.
Here is what I have so far
OS: Ubuntu 24.04 Kernel: 6.8.0-62-generic Nvidia GRID Version: 570.158.02-570.158.01-573.39 Openstack Deployment: Kolla w/ Docker Openstack Version: 2025.1
I have the SRIOV devices enabled with the systemd service, and can see the vgpu types;
# cat /sys/bus/pci/devices/0000:25:00.0/virtfn0/nvidia/creatable_vgpu_types ID : vGPU Name 908 : NVIDIA L4-1B 909 : NVIDIA L4-2B 910 : NVIDIA L4-1Q 911 : NVIDIA L4-2Q 912 : NVIDIA L4-3Q 913 : NVIDIA L4-4Q 914 : NVIDIA L4-6Q 915 : NVIDIA L4-8Q 916 : NVIDIA L4-12Q 917 : NVIDIA L4-24Q 918 : NVIDIA L4-1A 919 : NVIDIA L4-2A 920 : NVIDIA L4-3A 921 : NVIDIA L4-4A 922 : NVIDIA L4-6A 923 : NVIDIA L4-8A 924 : NVIDIA L4-12A 925 : NVIDIA L4-24A
My kolla node nova custom config (for nova-compute) is
[DEFAULT] debug = true verbose = true
[devices] enabled_mdev_types = nvidia-918
[mdev_nvidia-918] max_instances = 8
[libvirt] live_migration_downtime = 500000 live_migration_downtime_steps = 3 live_migration_downtime_delay = 3
I've also created flavors, classes, and traits with;
openstack flavor create l4-1a --ram 8192 --disk 40 --vcpus 4 openstack flavor set l4-1a --property "resources:VGPU=1" --property "trait:CUSTOM_NVIDIA_918=required" openstack resource class create CUSTOM_NVIDIA_918 openstack trait create CUSTOM_NVIDIA_918
However I can't seem to get placement to show any vGPU's or any of the traits I registered:
This just has an empty line # openstack allocation candidate list --resource VGPU=1
and this will just show the standard vCPU/Memory/Disk # openstack resource provider inventory list <Host UUID>
Have I missed a step somewhere, do I need to prepare the devices further before nova can pick them up?
Thanks for any and all help!
participants (2)
-
Karl Kloppenborg
-
Tyler Wilson