Re: 答复: Experience with VGPUs

Danny Webb Danny.Webb at thehutgroup.com
Tue Jan 17 11:52:04 UTC 2023


sorry, meant to say vGPU requires a homogeneous implementation.
________________________________
From: Danny Webb <Danny.Webb at thehutgroup.com>
Sent: 17 January 2023 11:50
To: Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: Re: 答复: Experience with VGPUs

MIG allows for a limited variation of instance types on the same card unlike vGPU which requires a heterogenous implementation.   see https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-profiles for more details.
________________________________
From: Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
Sent: 17 January 2023 11:16
Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: Re: 答复: Experience with VGPUs

CAUTION: This email originates from outside THG

Oh, wait a second, can you have multiple different types on 1 GPU? As
I don't think you can, or maybe it's limited to MIG mode only - I'm
using mostly vGPUs so not 100% sure about MIG mode.
But eventually on vGPU, once you create 1 type, all others become
unavailable. So originally each comand like
# cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances
1
# cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances
1
# cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances
1

BUT, once you create an mdev of specific type, rest will not report as
available anymore.
# echo ${uuidgen} >
/sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/create
# cat /sys/bus/pci/devices/0000\:84\:00.1/mdev_supported_types/nvidia-699/available_instances
0
# cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-699/available_instances
1
# cat /sys/bus/pci/devices/0000\:84\:00.2/mdev_supported_types/nvidia-700/available_instances
0

Please, correct me if I'm wrong here and Nvidia did some changes with
recent drivers or it's applicable only for vGPUs and it's not a case
for the MIG mode.

вт, 17 янв. 2023 г., 03:37 Alex Song (宋文平) <songwenping at inspur.com>:
>
>
> Hi, Ulrich:
>
> Sean is expert on VGPU management from nova side. I complete the usage steps if you are using Nova to manage MIGs for example:
> 1. divide the A100(80G) GPUs to 1g.10gb*1+2g.20gb*1+3g.40gb*1(one 1g.10gb, one 2g.20gb and one 3g.40gb)
> 2.add the device config in nova.conf:
> [devices]
> enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701
> [mdev_nvidia-699]
> device_addresses = 0000:84:00.1
> [mdev_nvidia-700]
> device_addresses = 0000:84:00.2
> [mdev_nvidia-701]
> device_addresses = 0000:84:00.3
> 3.config the flavor metadata with VGPU:1 and create vm use the flavor, the vm will randomly allocate one MIG from [1g.10gb,2g,20gb,3g.40gb]
> On step 2, if you have 2 A100(80G) GPUs on one node to use MIG, and the other GPU divide to 1g.10gb*3+4g.40gb*1, the config maybe like this:
> [devices]
> enabled_mdev_types = nvidia-699,nvidia-700,nvidia-701,nvidia-702
> [mdev_nvidia-699]
> device_addresses = 0000:84:00.1, 0000:3b:00.1
> [mdev_nvidia-700]
> device_addresses = 0000:84:00.2
> [mdev_nvidia-701]
> device_addresses = 0000:84:00.3,
> [mdev_nvidia-702]
> device_addresses = 0000:3b:00.3
>
> In our product, we use Cyborg to manage the MIGs, from the legacy style we also need config the mig like Nova, this is difficult to maintain, especially deploy openstack on k8s, so we remove these config and automatically discovery the MIGs and support divide MIG by cyborg api. By creating device profile with vgpu type traits(nvidia-699, nvidia-700), we can appoint MIG size to create VMs.
>
> Kind regards
>


Danny Webb
Principal OpenStack Engineer
Danny.Webb at thehutgroup.com
[THG Ingenuity Logo]
www.thg.com<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thg-ingenuity/?originalSubdomain=uk> [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgingenuity?lang=en>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230117/2b3d89f6/attachment.htm>


More information about the openstack-discuss mailing list