Hello,

Thanks for the reply. I can certainly add the device addresses however further down the same document there is a caveat stating that if the newer kernel variant driver that uses purely SRIOV virtual function devices it's no longer used (I confirmed I am using a card that supports this and using the 6.8 kernel which also supports it).

When using recent nVidia GPU architectures like Ampere or newer GPUs which have SR-IOV feature, Nova can’t know how many vGPUs can be used by a specific type. You then need to create virtual functions and then provide the list of the virtual functions per GPUs that can be used by setting device_addresses.
Changed in version 29.0.0: By the 2024.1 Caracal release, if you use those hardware, you need to provide a new configuration option named max_instances in the related mdev type group (eg. mdev_nvidia-35) where the value of that option would be the number of vGPUs that the type can create.

As an example for the A40-2Q nVidia GPU type which can create up to 24 vGPUs, please provide the below configuration :
```
[devices]
enabled_mdev_types = nvidia-558

[mdev_nvidia-558]
max_instances = 24
```
As a side note, you can see that we don’t use device_addresses in the mdev_nvidia-558 section, as we don’t need to tell which exact virtual functions we want to use for that type.

To use device_addresses with these VF's would I map them in a giant list or does it just need the host gpu pci id? Or is there another step I'm missing to account for this newer setup?

Thanks!

On Wed, Jun 25, 2025 at 8:57 PM Karl Kloppenborg <kkloppenborg@resetdata.com.au> wrote:

You’re missing the PCI device address profiles per the vGPU documentation:

As an example (from the documentation)
[devices]
enabled_mdev_types = nvidia-35, nvidia-36

[mdev_nvidia-35]
device_addresses = 0000:84:00.0,0000:85:00.0

[mdev_nvidia-36]
device_addresses = 0000:86:00.0
https://docs.openstack.org/nova/latest/admin/virtual-gpu.html

Karl Kloppenborg

Chief Technology Officer

m: +61 437 239 565
resetdata.com

ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting
For more information on the phasing of these requirements for business please visit;
https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf

This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au

From: Tyler Wilson <tyler@ghosty.pw>
Date: Thursday, 26 June 2025 at 1:09 pm
To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
Subject: Help with vGPU's using an NV L4 card on U24.04 (6.8)

[You don't often get email from tyler@ghosty.pw. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Hello All,

I'm trying to get vGPU's to work with openstack and I believe I am
almost there, however I can't seem to get them to register to
placement.

Here is what I have so far

OS: Ubuntu 24.04
Kernel: 6.8.0-62-generic
Nvidia GRID Version: 570.158.02-570.158.01-573.39
Openstack Deployment: Kolla w/ Docker
Openstack Version: 2025.1

I have the SRIOV devices enabled with the systemd service, and can see
the vgpu types;

# cat /sys/bus/pci/devices/0000:25:00.0/virtfn0/nvidia/creatable_vgpu_types
ID    : vGPU Name
908   : NVIDIA L4-1B
909   : NVIDIA L4-2B
910   : NVIDIA L4-1Q
911   : NVIDIA L4-2Q
912   : NVIDIA L4-3Q
913   : NVIDIA L4-4Q
914   : NVIDIA L4-6Q
915   : NVIDIA L4-8Q
916   : NVIDIA L4-12Q
917   : NVIDIA L4-24Q
918   : NVIDIA L4-1A
919   : NVIDIA L4-2A
920   : NVIDIA L4-3A
921   : NVIDIA L4-4A
922   : NVIDIA L4-6A
923   : NVIDIA L4-8A
924   : NVIDIA L4-12A
925   : NVIDIA L4-24A

My kolla node nova custom config (for nova-compute) is

[DEFAULT]
debug = true
verbose = true

[devices]
enabled_mdev_types = nvidia-918

[mdev_nvidia-918]
max_instances = 8

[libvirt]
live_migration_downtime = 500000
live_migration_downtime_steps = 3
live_migration_downtime_delay = 3

I've also created flavors, classes, and traits with;

openstack flavor create l4-1a --ram 8192 --disk 40 --vcpus 4
openstack flavor set l4-1a --property "resources:VGPU=1" --property
"trait:CUSTOM_NVIDIA_918=required"
openstack resource class create CUSTOM_NVIDIA_918
openstack trait create CUSTOM_NVIDIA_918

However I can't seem to get placement to show any vGPU's or any of the
traits I registered:

This just has an empty line
# openstack allocation candidate list --resource VGPU=1

and this will just show the standard vCPU/Memory/Disk
# openstack resource provider inventory list <Host UUID>

Have I missed a step somewhere, do I need to prepare the devices
further before nova can pick them up?

Thanks for any and all help!