Tesla V100 32G GPU with openstack

Satish Patel satish.txt at gmail.com
Thu Jan 20 12:55:01 UTC 2022


Thank you!

That is what I’m also trying to do to give each gpu card to each vm. I do have exact same setting in my nova.conf. What version of libvirt are you running? 

Did you install any special nvidia driver etc on your compute node for passthrough (I doubt because it straightforward). 

Do you have any NUMA setting in your flavor or compute?

Sent from my iPhone

> On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto <massimo.sgaravatto at gmail.com> wrote:
> 
> 
> Hi Satish
> 
> I am not able to understand what is wrong with your environment, but I can describe my setting.
> 
> I have a compute node with 4 Tesla V100S.
> They have the same vendor-id (10de) and the same product id (13d6) [*]
> In nova.conf I defined this stuff in the [pci] section:
> 
> [pci]
> passthrough_whitelist = {"vendor_id":"10de"}
> alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"}
> 
> 
> I then created a flavor with this property:
> 
> pci_passthrough:alias='V100:1'
> 
> Using this flavor I can instantiate  4 VMs: each one can see a single V100
> 
> Hope this helps
> 
> Cheers, Massimo
> 
> 
> [*]
> # lspci -nnk -d 10de:
> 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> [root at cld-np-gpu-01 ~]# 
> 
> 
>> On Wed, Jan 19, 2022 at 10:28 PM Satish Patel <satish.txt at gmail.com> wrote:
>> Hi Massimo,
>> 
>> Ignore my last email, my requirement is to have a single VM with a
>> single GPU ("tesla-v100:1")  but I would like to create a second VM on
>> the same compute node which uses the second GPU but I am getting the
>> following error when I create a second VM and vm error out. looks like
>> it's not allowing me to create a second vm and bind to a second GPU
>> card.
>> 
>> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
>> Hostdev already exists in the domain configuration
>> 
>> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel <satish.txt at gmail.com> wrote:
>> >
>> > should i need to create a flavor to target both GPU. is it possible to
>> > have single flavor cover both GPU because end users don't understand
>> > which flavor to use.
>> >
>> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto
>> > <massimo.sgaravatto at gmail.com> wrote:
>> > >
>> > > If I am not wrong those are 2 GPUs
>> > >
>> > > "tesla-v100:1" means 1 GPU
>> > >
>> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs
>> > >
>> > > Cheers, Massimo
>> > >
>> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel <satish.txt at gmail.com> wrote:
>> > >>
>> > >> Thank you for the information.  I have a quick question.
>> > >>
>> > >> [root at gpu01 ~]# lspci | grep -i nv
>> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>> > >> 32GB] (rev a1)
>> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>> > >> 32GB] (rev a1)
>> > >>
>> > >> In the above output showing two cards does that mean they are physical
>> > >> two or just BUS representation.
>> > >>
>> > >> Also i have the following entry in openstack flavor, does :1 means
>> > >> first GPU card?
>> > >>
>> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"}
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jan 18, 2022 at 5:55 AM António Paulo <antonio.paulo at cern.ch> wrote:
>> > >> >
>> > >> > Hey Satish, Gustavo,
>> > >> >
>> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license
>> > >> > per card and this gives you access to all the downloads you need through
>> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the
>> > >> > license server setup files.
>> > >> >
>> > >> > Cheers,
>> > >> > António
>> > >> >
>> > >> > On 18/01/22 02:46, Satish Patel wrote:
>> > >> > > Thank you so much! This is what I was looking for. It is very odd that
>> > >> > > we buy a pricey card but then we have to buy a license to make those
>> > >> > > features available.
>> > >> > >
>> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos
>> > >> > > <gustavofaganello.santos at windriver.com> wrote:
>> > >> > >>
>> > >> > >> Hello, Satish.
>> > >> > >>
>> > >> > >> I've been working with vGPU lately and I believe I can answer your
>> > >> > >> questions:
>> > >> > >>
>> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate
>> > >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to
>> > >> > >> spawn from 1 to several VMs using the same physical GPU, depending on
>> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the
>> > >> > >> Tesla V100 supports and their properties);
>> > >> > >> 2. Correct;
>> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where
>> > >> > >> your deployment of OpenStack is running AND in the VMs, so there are two
>> > >> > >> drivers to be installed in order to use the feature. I believe both of
>> > >> > >> them have to be purchased from NVIDIA in order to be used, and you would
>> > >> > >> also have to deploy an NVIDIA licensing server in order to validate the
>> > >> > >> licenses of the drivers running in the VMs.
>> > >> > >> 4. You can see what the instructions are for each of these scenarios in
>> > >> > >> [1] and [2].
>> > >> > >>
>> > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3].
>> > >> > >>
>> > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html
>> > >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html
>> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html
>> > >> > >>
>> > >> > >> Regards,
>> > >> > >> Gustavo.
>> > >> > >>
>> > >> > >> On 17/01/2022 14:41, Satish Patel wrote:
>> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>> > >> > >>>
>> > >> > >>> Folk,
>> > >> > >>>
>> > >> > >>> We have Tesla V100 32G GPU and I’m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question.
>> > >> > >>>
>> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet.
>> > >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct?
>> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here?
>> > >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU?
>> > >> > >>>
>> > >> > >>>
>> > >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ?
>> > >> > >>>
>> > >> > >>> Sent from my iPhone
>> > >> > >>>
>> > >> > >
>> > >>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220120/c9e5e982/attachment-0001.htm>


More information about the openstack-discuss mailing list