That's the problem with this feature
enablement in Nova : we mostly depend on a very specific external Linux
driver. So, tbc, if you want to use vGPU, please rather look at the
Nvidia documentation *before* :)
About multiple vGPUs, Nvidia says it depends on the GPU architecture (and that was changing since the last years) :
(quoting Nvidia here)
The supported vGPUs depend on
the architecture of the GPU on which the vGPUs reside:
- For GPUs based on the NVIDIA Volta architecture and later GPU architectures,
all Q-series and C-series vGPUs are supported. On GPUs that support the
Multi-Instance GPU (MIG) feature, both time-sliced and MIG-backed vGPUs are
supported.
- For GPUs based on the NVIDIA Pascal™ architecture, only Q-series
and C-series vGPUs that are allocated all of the physical GPU's frame buffer are
supported.
- For GPUs based on the NVIDIA NVIDIA Maxwell™ graphic architecture,
only Q-series vGPUs that are allocated all of the physical GPU's frame buffer are
supported.
You can assign multiple vGPUs with differing amounts of frame buffer to a single VM,
provided the board type and the series of all the vGPUs is the same. For example, you can
assign an A40-48C vGPU and an A40-16C vGPU to the same VM. However, you cannot assign an
A30-8C vGPU and an A16-8C vGPU to the same VM.
Basically,
what changed is that with the latest Volta and Ampere architecture,
Nvidia was able to provide different vGPUs with sliced frame buffer
recently, while previously Nvidia was only able to pin a vGPU taking the
whole pGPU frame buffer to a single VM, which was actually limiting de
facto the instance to only have one single vGPU attached (or having a
second vGPU attached from another pGPU, which is non trivial to
schedule)
For that reason, we initially limited the
VGPU allocation requests to a maximum of 1 in Nova since it was horribly
depending on hardware, but I eventually tried to propose to remove that
limitation with
https://review.opendev.org/c/openstack/nova/+/845757
which would need some further work and testing (which is nearly
impossible with upstream CI since the nvidia drivers are proprietary and
licensed).
Some operator wanting to
lift that current limitation would get all my attention if he/she would
volunteer for *testing* such patch. Ping me on IRC #openstack-nova
(bauzas) and we could proceed quickly.