Tesla V100 32G GPU with openstack

Massimo Sgaravatto massimo.sgaravatto at gmail.com
Thu Jan 20 13:28:34 UTC 2022


I am using libvirt 7.0 on centos8 stream, Openstack Train

nvidia drivers are installed only on the VMs (not on the compute node)
I am not using any numa setting in the flavor

But do you have the problem only when instantiating the second VM (while
everything is ok with the first one using 1 GPU ) ?


Cheers, Massimo

PS: When I configured the GPUs on openstack using pci passthrough, I
referred to these guides:

https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215


On Thu, Jan 20, 2022 at 1:55 PM Satish Patel <satish.txt at gmail.com> wrote:

> Thank you!
>
> That is what I’m also trying to do to give each gpu card to each vm. I do
> have exact same setting in my nova.conf. What version of libvirt are you
> running?
>
> Did you install any special nvidia driver etc on your compute node for
> passthrough (I doubt because it straightforward).
>
> Do you have any NUMA setting in your flavor or compute?
>
> Sent from my iPhone
>
> On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto <
> massimo.sgaravatto at gmail.com> wrote:
>
> 
> Hi Satish
>
> I am not able to understand what is wrong with your environment, but I can
> describe my setting.
>
> I have a compute node with 4 Tesla V100S.
> They have the same vendor-id (10de) and the same product id (13d6) [*]
> In nova.conf I defined this stuff in the [pci] section:
>
> [pci]
> passthrough_whitelist = {"vendor_id":"10de"}
>
> alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"}
>
>
> I then created a flavor with this property:
>
> pci_passthrough:alias='V100:1'
>
> Using this flavor I can instantiate  4 VMs: each one can see a single V100
>
> Hope this helps
>
> Cheers, Massimo
>
>
> [*]
> # lspci -nnk -d 10de:
> 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe
> 32GB] [10de:1df6] (rev a1)
> Subsystem: NVIDIA Corporation Device [10de:13d6]
> Kernel driver in use: vfio-pci
> Kernel modules: nouveau
> [root at cld-np-gpu-01 ~]#
>
>
> On Wed, Jan 19, 2022 at 10:28 PM Satish Patel <satish.txt at gmail.com>
> wrote:
>
>> Hi Massimo,
>>
>> Ignore my last email, my requirement is to have a single VM with a
>> single GPU ("tesla-v100:1")  but I would like to create a second VM on
>> the same compute node which uses the second GPU but I am getting the
>> following error when I create a second VM and vm error out. looks like
>> it's not allowing me to create a second vm and bind to a second GPU
>> card.
>>
>> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
>> Hostdev already exists in the domain configuration
>>
>> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel <satish.txt at gmail.com>
>> wrote:
>> >
>> > should i need to create a flavor to target both GPU. is it possible to
>> > have single flavor cover both GPU because end users don't understand
>> > which flavor to use.
>> >
>> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto
>> > <massimo.sgaravatto at gmail.com> wrote:
>> > >
>> > > If I am not wrong those are 2 GPUs
>> > >
>> > > "tesla-v100:1" means 1 GPU
>> > >
>> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will
>> be used to create an instance with 2 GPUs
>> > >
>> > > Cheers, Massimo
>> > >
>> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel <satish.txt at gmail.com>
>> wrote:
>> > >>
>> > >> Thank you for the information.  I have a quick question.
>> > >>
>> > >> [root at gpu01 ~]# lspci | grep -i nv
>> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>> > >> 32GB] (rev a1)
>> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>> > >> 32GB] (rev a1)
>> > >>
>> > >> In the above output showing two cards does that mean they are
>> physical
>> > >> two or just BUS representation.
>> > >>
>> > >> Also i have the following entry in openstack flavor, does :1 means
>> > >> first GPU card?
>> > >>
>> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"}
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jan 18, 2022 at 5:55 AM António Paulo <antonio.paulo at cern.ch>
>> wrote:
>> > >> >
>> > >> > Hey Satish, Gustavo,
>> > >> >
>> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU
>> license
>> > >> > per card and this gives you access to all the downloads you need
>> through
>> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well
>> as the
>> > >> > license server setup files.
>> > >> >
>> > >> > Cheers,
>> > >> > António
>> > >> >
>> > >> > On 18/01/22 02:46, Satish Patel wrote:
>> > >> > > Thank you so much! This is what I was looking for. It is very
>> odd that
>> > >> > > we buy a pricey card but then we have to buy a license to make
>> those
>> > >> > > features available.
>> > >> > >
>> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos
>> > >> > > <gustavofaganello.santos at windriver.com> wrote:
>> > >> > >>
>> > >> > >> Hello, Satish.
>> > >> > >>
>> > >> > >> I've been working with vGPU lately and I believe I can answer
>> your
>> > >> > >> questions:
>> > >> > >>
>> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will
>> allocate
>> > >> > >> the entire physical GPU to one single guest VM, while vGPU
>> allows you to
>> > >> > >> spawn from 1 to several VMs using the same physical GPU,
>> depending on
>> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU
>> types the
>> > >> > >> Tesla V100 supports and their properties);
>> > >> > >> 2. Correct;
>> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform
>> where
>> > >> > >> your deployment of OpenStack is running AND in the VMs, so
>> there are two
>> > >> > >> drivers to be installed in order to use the feature. I believe
>> both of
>> > >> > >> them have to be purchased from NVIDIA in order to be used, and
>> you would
>> > >> > >> also have to deploy an NVIDIA licensing server in order to
>> validate the
>> > >> > >> licenses of the drivers running in the VMs.
>> > >> > >> 4. You can see what the instructions are for each of these
>> scenarios in
>> > >> > >> [1] and [2].
>> > >> > >>
>> > >> > >> There is also extensive documentation on vGPU at NVIDIA's
>> website [3].
>> > >> > >>
>> > >> > >> [1]
>> https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html
>> > >> > >> [2]
>> https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html
>> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html
>> > >> > >>
>> > >> > >> Regards,
>> > >> > >> Gustavo.
>> > >> > >>
>> > >> > >> On 17/01/2022 14:41, Satish Patel wrote:
>> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>> > >> > >>>
>> > >> > >>> Folk,
>> > >> > >>>
>> > >> > >>> We have Tesla V100 32G GPU and I’m trying to configure with
>> openstack wallaby. This is first time dealing with GPU so I have couple of
>> question.
>> > >> > >>>
>> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did
>> google but not very clear yet.
>> > >> > >>> 2. If I configure it passthrough then does it only work with
>> single VM ? ( I meant whole GPU will get allocate to single VM correct?
>> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some
>> folks saying you need license. I have no idea where to get that license.
>> What is the deal here?
>> > >> > >>> 3. What are the config difference between configure this card
>> with passthrough vs vGPU?
>> > >> > >>>
>> > >> > >>>
>> > >> > >>> Currently I configure it with passthrough based one one
>> article and I am able to spun up with and I can see nvidia card exposed to
>> vm. (I used iommu and vfio based driver) so if this card support vGPU then
>> do I need iommu and vfio or some other driver to make it virtualize ?
>> > >> > >>>
>> > >> > >>> Sent from my iPhone
>> > >> > >>>
>> > >> > >
>> > >>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220120/626e1d27/attachment.htm>


More information about the openstack-discuss mailing list