Tesla V100 32G GPU with openstack

Satish Patel satish.txt at gmail.com
Thu Jan 20 15:32:03 UTC 2022


Hi Massimo,

My problem got resolved :(  it was very stupid problem. I have
glusterfs mounted on /var/lib/nova and somehow after reboot node that
mount point disappears and /var/lib/nova endup on a local disk which
has only a 50G partition. My flavor has disk 40G so the first vm
always works but second vm i get a strange error :(   after fixing my
mount point everything works :)

Mostly when you don't have enough space i should get an error like No
Valid Host but i was getting a very different error so that mislead me
:(

Thank you for your help though. now time to play with InfiniBand
configuration (are you guys using InfiniBand in your cloud?)

On Thu, Jan 20, 2022 at 8:28 AM Massimo Sgaravatto
<massimo.sgaravatto at gmail.com> wrote:
>
> I am using libvirt 7.0 on centos8 stream, Openstack Train
>
> nvidia drivers are installed only on the VMs (not on the compute node)
> I am not using any numa setting in the flavor
>
> But do you have the problem only when instantiating the second VM (while everything is ok with the first one using 1 GPU ) ?
>
>
> Cheers, Massimo
>
> PS: When I configured the GPUs on openstack using pci passthrough, I referred to these guides:
>
> https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
> https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215
>
>
> On Thu, Jan 20, 2022 at 1:55 PM Satish Patel <satish.txt at gmail.com> wrote:
>>
>> Thank you!
>>
>> That is what I’m also trying to do to give each gpu card to each vm. I do have exact same setting in my nova.conf. What version of libvirt are you running?
>>
>> Did you install any special nvidia driver etc on your compute node for passthrough (I doubt because it straightforward).
>>
>> Do you have any NUMA setting in your flavor or compute?
>>
>> Sent from my iPhone
>>
>> On Jan 20, 2022, at 2:52 AM, Massimo Sgaravatto <massimo.sgaravatto at gmail.com> wrote:
>>
>> 
>> Hi Satish
>>
>> I am not able to understand what is wrong with your environment, but I can describe my setting.
>>
>> I have a compute node with 4 Tesla V100S.
>> They have the same vendor-id (10de) and the same product id (13d6) [*]
>> In nova.conf I defined this stuff in the [pci] section:
>>
>> [pci]
>> passthrough_whitelist = {"vendor_id":"10de"}
>> alias={"name":"V100","product_id":"1df6","vendor_id":"10de","device_type":"type-PCI"}
>>
>>
>> I then created a flavor with this property:
>>
>> pci_passthrough:alias='V100:1'
>>
>> Using this flavor I can instantiate  4 VMs: each one can see a single V100
>>
>> Hope this helps
>>
>> Cheers, Massimo
>>
>>
>> [*]
>> # lspci -nnk -d 10de:
>> 60:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
>> Subsystem: NVIDIA Corporation Device [10de:13d6]
>> Kernel driver in use: vfio-pci
>> Kernel modules: nouveau
>> 61:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
>> Subsystem: NVIDIA Corporation Device [10de:13d6]
>> Kernel driver in use: vfio-pci
>> Kernel modules: nouveau
>> da:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
>> Subsystem: NVIDIA Corporation Device [10de:13d6]
>> Kernel driver in use: vfio-pci
>> Kernel modules: nouveau
>> db:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
>> Subsystem: NVIDIA Corporation Device [10de:13d6]
>> Kernel driver in use: vfio-pci
>> Kernel modules: nouveau
>> [root at cld-np-gpu-01 ~]#
>>
>>
>> On Wed, Jan 19, 2022 at 10:28 PM Satish Patel <satish.txt at gmail.com> wrote:
>>>
>>> Hi Massimo,
>>>
>>> Ignore my last email, my requirement is to have a single VM with a
>>> single GPU ("tesla-v100:1")  but I would like to create a second VM on
>>> the same compute node which uses the second GPU but I am getting the
>>> following error when I create a second VM and vm error out. looks like
>>> it's not allowing me to create a second vm and bind to a second GPU
>>> card.
>>>
>>> error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
>>> Hostdev already exists in the domain configuration
>>>
>>> On Wed, Jan 19, 2022 at 3:10 PM Satish Patel <satish.txt at gmail.com> wrote:
>>> >
>>> > should i need to create a flavor to target both GPU. is it possible to
>>> > have single flavor cover both GPU because end users don't understand
>>> > which flavor to use.
>>> >
>>> > On Wed, Jan 19, 2022 at 1:54 AM Massimo Sgaravatto
>>> > <massimo.sgaravatto at gmail.com> wrote:
>>> > >
>>> > > If I am not wrong those are 2 GPUs
>>> > >
>>> > > "tesla-v100:1" means 1 GPU
>>> > >
>>> > > So e.g. a flavor with "pci_passthrough:alias": "tesla-v100:2"} will be used to create an instance with 2 GPUs
>>> > >
>>> > > Cheers, Massimo
>>> > >
>>> > > On Tue, Jan 18, 2022 at 11:35 PM Satish Patel <satish.txt at gmail.com> wrote:
>>> > >>
>>> > >> Thank you for the information.  I have a quick question.
>>> > >>
>>> > >> [root at gpu01 ~]# lspci | grep -i nv
>>> > >> 5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>>> > >> 32GB] (rev a1)
>>> > >> d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe
>>> > >> 32GB] (rev a1)
>>> > >>
>>> > >> In the above output showing two cards does that mean they are physical
>>> > >> two or just BUS representation.
>>> > >>
>>> > >> Also i have the following entry in openstack flavor, does :1 means
>>> > >> first GPU card?
>>> > >>
>>> > >> {"gpu-node": "true", "pci_passthrough:alias": "tesla-v100:1"}
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Tue, Jan 18, 2022 at 5:55 AM António Paulo <antonio.paulo at cern.ch> wrote:
>>> > >> >
>>> > >> > Hey Satish, Gustavo,
>>> > >> >
>>> > >> > Just to clarify a bit on point 3, you will have to buy a vGPU license
>>> > >> > per card and this gives you access to all the downloads you need through
>>> > >> > NVIDIA's web dashboard -- both the host and guest drivers as well as the
>>> > >> > license server setup files.
>>> > >> >
>>> > >> > Cheers,
>>> > >> > António
>>> > >> >
>>> > >> > On 18/01/22 02:46, Satish Patel wrote:
>>> > >> > > Thank you so much! This is what I was looking for. It is very odd that
>>> > >> > > we buy a pricey card but then we have to buy a license to make those
>>> > >> > > features available.
>>> > >> > >
>>> > >> > > On Mon, Jan 17, 2022 at 2:07 PM Gustavo Faganello Santos
>>> > >> > > <gustavofaganello.santos at windriver.com> wrote:
>>> > >> > >>
>>> > >> > >> Hello, Satish.
>>> > >> > >>
>>> > >> > >> I've been working with vGPU lately and I believe I can answer your
>>> > >> > >> questions:
>>> > >> > >>
>>> > >> > >> 1. As you pointed out in question #2, the pci-passthrough will allocate
>>> > >> > >> the entire physical GPU to one single guest VM, while vGPU allows you to
>>> > >> > >> spawn from 1 to several VMs using the same physical GPU, depending on
>>> > >> > >> the vGPU type you choose (check NVIDIA docs to see which vGPU types the
>>> > >> > >> Tesla V100 supports and their properties);
>>> > >> > >> 2. Correct;
>>> > >> > >> 3. To use vGPU, you need vGPU drivers installed on the platform where
>>> > >> > >> your deployment of OpenStack is running AND in the VMs, so there are two
>>> > >> > >> drivers to be installed in order to use the feature. I believe both of
>>> > >> > >> them have to be purchased from NVIDIA in order to be used, and you would
>>> > >> > >> also have to deploy an NVIDIA licensing server in order to validate the
>>> > >> > >> licenses of the drivers running in the VMs.
>>> > >> > >> 4. You can see what the instructions are for each of these scenarios in
>>> > >> > >> [1] and [2].
>>> > >> > >>
>>> > >> > >> There is also extensive documentation on vGPU at NVIDIA's website [3].
>>> > >> > >>
>>> > >> > >> [1] https://docs.openstack.org/nova/wallaby/admin/virtual-gpu.html
>>> > >> > >> [2] https://docs.openstack.org/nova/wallaby/admin/pci-passthrough.html
>>> > >> > >> [3] https://docs.nvidia.com/grid/13.0/index.html
>>> > >> > >>
>>> > >> > >> Regards,
>>> > >> > >> Gustavo.
>>> > >> > >>
>>> > >> > >> On 17/01/2022 14:41, Satish Patel wrote:
>>> > >> > >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>> > >> > >>>
>>> > >> > >>> Folk,
>>> > >> > >>>
>>> > >> > >>> We have Tesla V100 32G GPU and I’m trying to configure with openstack wallaby. This is first time dealing with GPU so I have couple of question.
>>> > >> > >>>
>>> > >> > >>> 1. What is the difference between passthrough vs vGPU? I did google but not very clear yet.
>>> > >> > >>> 2. If I configure it passthrough then does it only work with single VM ? ( I meant whole GPU will get allocate to single VM correct?
>>> > >> > >>> 3. Also some document saying Tesla v100 support vGPU but some folks saying you need license. I have no idea where to get that license. What is the deal here?
>>> > >> > >>> 3. What are the config difference between configure this card with passthrough vs vGPU?
>>> > >> > >>>
>>> > >> > >>>
>>> > >> > >>> Currently I configure it with passthrough based one one article and I am able to spun up with and I can see nvidia card exposed to vm. (I used iommu and vfio based driver) so if this card support vGPU then do I need iommu and vfio or some other driver to make it virtualize ?
>>> > >> > >>>
>>> > >> > >>> Sent from my iPhone
>>> > >> > >>>
>>> > >> > >
>>> > >>



More information about the openstack-discuss mailing list