Cyborg nova reports mdev-capable resource is not available

Sylvain Bauza sbauza at redhat.com
Thu Sep 21 15:48:48 UTC 2023


Le jeu. 21 sept. 2023 à 17:27, Karl Kloppenborg <
kkloppenborg at resetdata.com.au> a écrit :

> Hi Cyborg Team!
>
> Karl from Helm Team.
>
>
>
> When creating a VM with the correct flavor, the mdev gets created by
> cyborg agent and I can see it in the nodedev-list --cap mdev.
>
> However Nova then fails with:
>
> nova.virt.libvirt.driver [<removed>- - default default] Searching for
> available mdevs... _get_existing_mdevs_not_assigned
> /var/lib/openstack/lib/python3.10/site-packages/nova/virt/libvirt/driver.py
>
> :8357
>
> 2023-09-21 14:34:47.808 1901814 INFO nova.virt.libvirt.driver [<removed> -
> - default default] Available mdevs at: set().
>
> 2023-09-21 14:34:47.809 1901814 DEBUG nova.virt.libvirt.driver [<removed>
> - - default default] No available mdevs where found. Creating an new one...
> _allocate_mdevs
> /var/lib/openstack/lib/python3.10/site-packages/nova/virt/libvirt/driv
>
> er.py:8496
>
> 2023-09-21 14:34:47.809 1901814 DEBUG nova.virt.libvirt.driver [<removed>
> - - default default] Attempting to create new mdev...
> _create_new_mediated_device
> /var/lib/openstack/lib/python3.10/site-packages/nova/virt/libvirt/driver.py:8385
>
> 2023-09-21 14:34:48.455 1901814 INFO nova.virt.libvirt.driver [<removed> -
> - default default] Failed to create mdev. No free space found among the
> following devices: ['pci_0000_4b_03_1', … <truncated list>].
>
> 2023-09-21 14:34:48.456 1901814 ERROR nova.compute.manager [<removed> - -
> default default] [instance: 2026e2a2-b17a-43ab-adcb-62a907f58b51] Instance
> failed to spawn: nova.exception.ComputeResourcesUnavailable: Insufficient
> compute resources: mdev-capable resource is not available.
>
>
>

I don't exactly remember how Cyborg passes the devices to nova/libvirt but
this exception is because none of the available GPUs have either existing
mdevs or capability for creating mdevs.
You should first check sysfs to double-check the state of our GPU devices
in order to understand how much of vGPU capacity you still have.

-Sylvain

Once this happened, ARQ removes the mdev and cleans up.
>
>
>
> I’ve got Cyborg 2023.2 running and have a device profile like so:
>
> karl at Karls-Air ~ % openstack accelerator device profile show
> e2b07e11-fe69-4f33-83fc-0f9e38adb7ae
>
>
> +-------------+---------------------------------------------------------------------------+
>
> | Field       |
> Value                                                                     |
>
>
> +-------------+---------------------------------------------------------------------------+
>
> | created_at  | 2023-09-21
> 13:30:05+00:00                                                 |
>
> | updated_at  |
> None                                                                      |
>
> | uuid        |
> e2b07e11-fe69-4f33-83fc-0f9e38adb7ae                                      |
>
> | name        |
> VGPU_A40-Q48                                                              |
>
> | groups      | [{'resources:VGPU': '1',
> 'trait:CUSTOM_NVIDIA_2235_A40_48Q': 'required'}] |
>
> | description |
> None                                                                      |
>
>
> +-------------+---------------------------------------------------------------------------+
>
> karl at Karls-Air ~ %
>
>
>
> I can see the allocation candidate:
>
> karl at Karls-Air ~ % openstack allocation candidate list --resource VGPU=1
> | grep A40
>
> |  41 | VGPU=1     | 229bf15f-5689-3d2c-b37b-5c8439ea6a71 |
> VGPU=0/1                | OWNER_CYBORG,CUSTOM_NVIDIA_2235_A40_48Q |
>
> karl at Karls-Air ~ %
>
>
>
>
>
> Am I missing something critical here? Because I cannot seem to figure this
> out… have I got a PCI address wrong, or something?
>
>
>
> Any help from the Cyborg or Nova teams would be fantastic.
>
>
>
> Thanks,
> Karl.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230921/e583832b/attachment.htm>


More information about the openstack-discuss mailing list