I am using libvirt 7.0 on centos8 stream, Openstack Train

nvidia drivers are installed only on the VMs (not on the compute node)
I am not using any numa setting in the flavor

But do you have the problem only when instantiating the second VM (while everything is ok with the first one using 1 GPU ) ?


Cheers, Massimo

PS: When I configured the GPUs on openstack using pci passthrough, I referred to these guides:

https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215


On Thu, Jan 20, 2022 at 1:55 PM Satish Patel <satish.txt@gmail.com> wrote:
Thank you!

That is what I’m also trying to do to give each gpu card to each vm. I do have exact same setting in my nova.conf. What version of libvirt are you running? 

Did you install any special nvidia driver etc on your compute node for passthrough (I doubt because it straightforward). 

Do you have any NUMA setting in your flavor or compute?

Sent from my iPhone

On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto <massimo.sgaravatto@gmail.com> wrote:


Hi Satish

I am not able to understand what is wrong with your environment, but I can describe my setting.

I have a compute node with 4 Tesla V100S.
They have the same vendor-id (10de) and the same product id (13d6) [*]
In nova.conf I defined this stuff in the [pci] section:

[pci]
passthrough_whitelist = {"vendor_id":"10de"}
alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"}


I then created a flavor with this property:

pci_passthrough:alias='V100:1'

Using this flavor I can instantiate  4 VMs: each one can see a single V100

Hope this helps

Cheers, Massimo


[*]
# lspci -nnk -d 10de:
60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:13d6]
Kernel driver in use: vfio-pci
Kernel modules: nouveau
61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:13d6]
Kernel driver in use: vfio-pci
Kernel modules: nouveau
da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:13d6]
Kernel driver in use: vfio-pci
Kernel modules: nouveau
db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:13d6]
Kernel driver in use: vfio-pci
Kernel modules: nouveau
[root@cld-np-gpu-01 ~]# 


On Wed, Jan 19, 2022 at 10:28 PM Satish Patel <satish.txt@gmail.com> wrote:
Hi Massimo,

Ignore my last email, my requirement is to have a single VM with a
single GPU ("tesla-v100:1")  but I would like to create a second VM on
the same compute node which uses the second GPU but I am getting the
following error when I create a second VM and vm error out. looks like
it's not allowing me to create a second vm and bind to a second GPU
card.

error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
Hostdev already exists in the domain configuration

On Wed, Jan 19, 2022 at 3:10 PM Satish Patel <satish.txt@gmail.com> wrote:
>
> should i need to create a flavor to target both GPU. is it possible to
> have single flavor cover both GPU because end users don't understand
> which flavor to use.
>
> On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto
> <massimo.sgaravatto@gmail.com> wrote:
> >
> > If I am not wrong those are 2 GPUs
> >
> > "tesla-v100:1" means 1 GPU
> >
> > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs
> >
> > Cheers, Massimo
> >
> > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel <satish.txt@gmail.com> wrote:
> >>
> >> Thank you for the information.  I have a quick question.
> >>
> >> [root@gpu01 ~]# lspci | grep -i nv
> >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> >> 32GB] (rev a1)
> >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> >> 32GB] (rev a1)
> >>
> >> In the above output showing two cards does that mean they are physical
> >> two or just BUS representation.
> >>
> >> Also i have the following entry in openstack flavor, does :1 means
> >> first GPU card?
> >>
> >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"}
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Jan 18, 2022 at 5:55 AM António Paulo <antonio.paulo@cern.ch> wrote:
> >> >
> >> > Hey Satish, Gustavo,
> >> >
> >> > Just to clarify a bit on point 3, you will have to buy a vGPU license
> >> > per card and this gives you access to all the downloads you need through
> >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the
> >> > license server setup files.
> >> >
> >> > Cheers,
> >> > António
> >> >
> >> > On 18/01/22 02:46, Satish Patel wrote:
> >> > > Thank you so much! This is what I was looking for. It is very odd that
> >> > > we buy a pricey card but then we have to buy a license to make those
> >> > > features available.
> >> > >
> >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos
> >> > > <gustavofaganello.santos@windriver.com> wrote:
> >> > >>
> >> > >> Hello, Satish.
> >> > >>
> >> > >> I've been working with vGPU lately and I believe I can answer your
> >> > >> questions:
> >> > >>
> >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate
> >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to
> >> > >> spawn from 1 to several VMs using the same physical GPU, depending on
> >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the
> >> > >> Tesla V100 supports and their properties);
> >> > >> 2. Correct;
> >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where
> >> > >> your deployment of OpenStack is running AND in the VMs, so there are two
> >> > >> drivers to be installed in order to use the feature. I believe both of
> >> > >> them have to be purchased from NVIDIA in order to be used, and you would
> >> > >> also have to deploy an NVIDIA licensing server in order to validate the
> >> > >> licenses of the drivers running in the VMs.
> >> > >> 4. You can see what the instructions are for each of these scenarios in
> >> > >> [1] and [2].
> >> > >>
> >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3].
> >> > >>
> >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html
> >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html
> >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html
> >> > >>
> >> > >> Regards,
> >> > >> Gustavo.
> >> > >>
> >> > >> On 17/01/2022 14:41, Satish Patel wrote:
> >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >> > >>>
> >> > >>> Folk,
> >> > >>>
> >> > >>> We have Tesla V100 32G GPU and I’m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question.
> >> > >>>
> >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet.
> >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct?
> >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here?
> >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU?
> >> > >>>
> >> > >>>
> >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ?
> >> > >>>
> >> > >>> Sent from my iPhone
> >> > >>>
> >> > >
> >>