<div dir="auto">I can recall in quite recent release notes in Nvidia drivers, that now they do allow attaching multiple vGPUs to a single VM, but I can recall Sylvain said that is not exactly as it sounds like and there're severe limitations to this advertised feature.<div dir="auto"><br></div><div dir="auto">Also I think in MIG mode it's possible to split GPU in a subset of supported (but different) flavors, though I have close to no idea how scheduling would be done in this case.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 21, 2023, 17:36 Ulrich Schwickerath <<a href="mailto:ulrich.schwickerath@cern.ch">ulrich.schwickerath@cern.ch</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p>Hi, again,</p>
<p>here's a link to my slides:<br>
</p>
<p><a href="https://cernbox.cern.ch/s/v3YCyJjrZZv55H2" target="_blank" rel="noreferrer">https://cernbox.cern.ch/s/v3YCyJjrZZv55H2</a></p>
<p>Let me know if it works.</p>
<p>Cheers, Ulrich</p>
<p><br>
</p>
<div>On 21/06/2023 16:10, Ulrich
Schwickerath wrote:<br>
</div>
<blockquote type="cite">
<p>Hi, all,</p>
<p>Sylvain explained quite well how to do it technically. We have
a PoC running, however, still have some stability issues, as
mentioned on the summit. We're running the NVIDIA virtualisation
drivers on the hypervisors and the guests, which requires a
license from NVIDIA. In our configuration we are still quite
limited in the sense that we have to configure all cards in the
same hypervisor in the same way, that is the same MIG
partitioning. Also, it is not possible to attach more than one
device to a single VM.<br>
</p>
<p>As mentioned in the presentation we are a bit behind with Nova,
and in the process of fixing this as we speak. Because of that
we had to do a couple of back ports in Nova to make it work,
which we hope to be able to get rid of by the ongoing upgrades.<br>
</p>
<p>Let me see if I can make the slides available here. <br>
</p>
<p>Cheers, Ulrich<br>
</p>
<div>On 20/06/2023 19:07, Oliver Weinmann
wrote:<br>
</div>
<blockquote type="cite"> Hi
everyone,
<div><br>
</div>
<div>Jumping into this topic again. Unfortunately I haven’t had
time yet to test Nvidia VGPU in OpenStack but in VMware
Vsphere. What our users complain most about is the
inflexibility since you have to use the same profile on all
vms that use the gpu. One user mentioned to try SLURM. I know
there is no official OpenStack project for SLURM but I wonder
if anyone else tried this approach? If I understood correctly
this would also not require any Nvidia subscription since you
passthrough the GPU to a single instance and you don’t use
VGPU nor MIG.</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Oliver<br>
<br>
<div dir="ltr">Von meinem iPhone gesendet</div>
<div dir="ltr"><br>
<blockquote type="cite">Am 20.06.2023 um 17:34 schrieb
Sylvain Bauza <a href="mailto:sbauza@redhat.com" target="_blank" rel="noreferrer"><sbauza@redhat.com></a>:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le mar. 20 juin 2023
à 16:31, Mahendra Paipuri <<a href="mailto:mahendra.paipuri@cnrs.fr" target="_blank" rel="noreferrer">mahendra.paipuri@cnrs.fr</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Thanks Sylvain for the pointers.</p>
<p>One of the questions we have is: can we create
MIG profiles on the host and then attach each
one or more profile(s) to VMs? This bug [1]
reports that once we attach one profile to a VM,
rest of MIG profiles become unavailable. From
what you have said about using SR-IOV and VFs, I
guess this should be possible.<br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>Correct, what you need is to create first the VFs
using sriov-manage and then you can create the MIG
instances.</div>
<div>Once you create the MIG instances using the
profiles you want, you will see that the related
available_instances for the nvidia mdev type (by
looking at sysfs) will say that you can have a
single vGPU for this profile.</div>
<div>Then, you can use that mdev type with Nova using
nova.conf.</div>
<div><br>
</div>
<div>That being said, while this above is simple, the
below talk was saying more about how to correctly
use the GPU by the host so please wait :-)</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>I think you are talking about "vGPUs with
OpenStack Nova" talk on OpenInfra stage. I will
look into it once the videos will be online. <br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>Indeed.</div>
<div>-S <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>[1] <a href="https://bugs.launchpad.net/nova/+bug/2008883" target="_blank" rel="noreferrer">https://bugs.launchpad.net/nova/+bug/2008883</a></p>
<p>Thanks</p>
<p>Regards</p>
<p>Mahendra<br>
</p>
<div>On 20/06/2023 15:47, Sylvain Bauza wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le mar. 20
juin 2023 à 15:12, PAIPURI Mahendra <<a href="mailto:mahendra.paipuri@cnrs.fr" target="_blank" rel="noreferrer">mahendra.paipuri@cnrs.fr</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div id="m_-7871945658238693092m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p>Hello Ulrich,</p>
<p><br>
</p>
<p>I am relaunching this discussion as
I noticed that you gave a talk about
this topic at OpenInfra Summit in
Vancouver. Is it possible to share
the presentation here? I hope the
talks will be uploaded soon in
YouTube. </p>
<p><br>
</p>
<p>We are mainly interested in using
MIG instances in Openstack cloud and
I could not really find a lot of
information by googling. If you
could share your experiences, that
would be great.</p>
<p><br>
</p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Due to scheduling conflicts, I wasn't
able to attend Ulrich's session but his
feedback will be greatly listened to by
me.</div>
<div><br>
</div>
<div>FWIW, there was also a short session
about how to enable MIG and play with Nova
at the OpenInfra stage (and that one I was
able to attend it), and it was quite
seamless. What exact information are you
looking for ?</div>
<div>The idea with MIG is that you need to
create SRIOV VFs above the MIG instances
using sriov-manage script provided by
nvidia so that the mediated devices will
use those VFs as the base PCI devices to
be used for Nova.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div id="m_-7871945658238693092m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p> </p>
<p>Cheers.</p>
<p><br>
</p>
<p>Regards</p>
<p>Mahendra</p>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_-7871945658238693092m_7191695452821608857m_2020284182605405898divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>De :</b> Ulrich
Schwickerath <<a href="mailto:Ulrich.Schwickerath@cern.ch" target="_blank" rel="noreferrer">Ulrich.Schwickerath@cern.ch</a>><br>
<b>Envoyé :</b> lundi 16 janvier
2023 11:38:08<br>
<b>À :</b> <a href="mailto:openstack-discuss@lists.openstack.org" target="_blank" rel="noreferrer">openstack-discuss@lists.openstack.org</a><br>
<b>Objet :</b> Re: 答复: Experience
with VGPUs</font>
<div> </div>
</div>
<div>
<p>Hi, all,</p>
<p>just to add to the discussion, at
CERN we have recently deployed a
bunch of A100 GPUs in PCI
passthrough mode, and are now
looking into improving their usage
by using MIG. From the NOVA point of
view things seem to work OK, we can
schedule VMs requesting a VGPU, the
client starts up and gets a license
token from our NVIDIA license server
(distributing license keys is our
private cloud is relatively easy in
our case). It's a PoC only for the
time being, and we're not ready to
put that forward as we're facing
issues with CUDA on the client (it
fails immediately in memory
operations with 'not supported',
still investigating why this
happens). <br>
</p>
<p>Once we get that working it would
be nice to be able to have a more
fine grained scheduling so that
people can ask for MIG devices of
different size. The other challenge
is how to set limits on GPU
resources. Once the above issues
have been sorted out we may want to
look into cyborg as well thus we are
quite interested in first
experiences with this.</p>
<p>Kind regards, </p>
<p>Ulrich<br>
</p>
<div>On 13.01.23 21:06, Dmitriy
Rabotyagov wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>To have that said, deb/rpm
packages they are providing
doesn't help much, as:
<div dir="auto">* There is no
repo for them, so you need to
download them manually from
enterprise portal</div>
<div dir="auto">* They can't be
upgraded anyway, as driver
version is part of the package
name. And each package
conflicts with any another
one. So you need to explicitly
remove old package and only
then install new one. And yes,
you must stop all VMs before
upgrading driver and no, you
can't live migrate GPU mdev
devices due to that now being
implemented in qemu. So
deb/rpm/generic driver doesn't
matter at the end tbh.</div>
<br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">пт, 13
янв. 2023 г., 20:56 Cedric
<<a href="mailto:yipikai7@gmail.com" target="_blank" rel="noreferrer">yipikai7@gmail.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="auto"><br>
Ended up with the very
same conclusions than
Dimitry regarding the use
of Nvidia Vgrid for the
VGPU use case with Nova,
it works pretty well but:<br>
<br>
- respecting the licensing
model as operationnal
constraints, note that
guests need to reach a
license server in order to
get a token (could be via
the Nvidia SaaS service or
on-prem)<br>
- drivers for both guest
and hypervisor are not
easy to implement and
maintain on large scale. A
year ago, hypervisors
drivers were not packaged
to Debian/Ubuntu, but
builded though a bash
script, thus requiering
additional automatisation
work and careful attention
regarding kernel
update/reboot of Nova
hypervisors.<br>
<br>
Cheers</div>
<br>
<br>
On Fri, Jan 13, 2023 at 4:21
PM Dmitriy Rabotyagov <<a href="mailto:noonedeadpunk@gmail.com" rel="noreferrer noreferrer
noreferrer noreferrer noreferrer" target="_blank">noonedeadpunk@gmail.com</a>>
wrote:<br>
><br>
> You are saying that,
like Nvidia GRID drivers are
open-sourced while<br>
> in fact they're super
far from being that. In
order to download<br>
> drivers not only for
hypervisors, but also for
guest VMs you need to<br>
> have an account in
their Enterprise Portal. It
took me roughly 6 weeks<br>
> of discussions with
hardware vendors and Nvidia
support to get a<br>
> proper account there.
And that happened only after
applying for their<br>
> Partner Network (NPN).<br>
> That still doesn't
solve the issue of how to
provide drivers to<br>
> guests, except
pre-build a series of images
with these drivers<br>
> pre-installed (we ended
up with making a DIB element
for that [1]).<br>
> Not saying about the
need to distribute license
tokens for guests and<br>
> the whole mess with
compatibility between
hypervisor and guest drivers<br>
> (as guest driver can't
be newer then host one, and
HVs can't be too<br>
> new either).<br>
><br>
> It's not that I'm
protecting AMD, but just
saying that Nvidia is not<br>
> that straightforward
either, and at least on
paper AMD vGPUs look<br>
> easier both for
operators and end-users.<br>
><br>
> [1] <a href="https://github.com/citynetwork/dib-elements/tree/main/nvgrid" rel="noreferrer noreferrer
noreferrer noreferrer
noreferrer noreferrer" target="_blank">
https://github.com/citynetwork/dib-elements/tree/main/nvgrid</a><br>
><br>
> ><br>
> > As for AMD cards,
AMD stated that some of
their MI series card
supports SR-IOV for vGPUs.
However, those drivers are
never open source or
provided closed source to
public, only large cloud
providers are able to get
them. So I don't really
recommend getting AMD cards
for vGPU unless you are able
to get support from them.<br>
> ><br>
><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</blockquote>
</div>
</blockquote></div>