<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi, all,</p>
<p>Sylvain explained quite well how to do it technically. We have a
PoC running, however, still have some stability issues, as
mentioned on the summit. We're running the NVIDIA virtualisation
drivers on the hypervisors and the guests, which requires a
license from NVIDIA. In our configuration we are still quite
limited in the sense that we have to configure all cards in the
same hypervisor in the same way, that is the same MIG
partitioning. Also, it is not possible to attach more than one
device to a single VM.<br>
</p>
<p>As mentioned in the presentation we are a bit behind with Nova,
and in the process of fixing this as we speak. Because of that we
had to do a couple of back ports in Nova to make it work, which we
hope to be able to get rid of by the ongoing upgrades.<br>
</p>
<p>Let me see if I can make the slides available here. <br>
</p>
<p>Cheers, Ulrich<br>
</p>
<div class="moz-cite-prefix">On 20/06/2023 19:07, Oliver Weinmann
wrote:<br>
</div>
<blockquote type="cite" cite="mid:2DD18791-4BFD-4FF0-AAAC-77D8C18FB138@me.com">
Hi everyone,
<div><br>
</div>
<div>Jumping into this topic again. Unfortunately I haven’t had
time yet to test Nvidia VGPU in OpenStack but in VMware Vsphere.
What our users complain most about is the inflexibility since
you have to use the same profile on all vms that use the gpu.
One user mentioned to try SLURM. I know there is no official
OpenStack project for SLURM but I wonder if anyone else tried
this approach? If I understood correctly this would also not
require any Nvidia subscription since you passthrough the GPU to
a single instance and you don’t use VGPU nor MIG.</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Oliver<br>
<br>
<div dir="ltr">Von meinem iPhone gesendet</div>
<div dir="ltr"><br>
<blockquote type="cite">Am 20.06.2023 um 17:34 schrieb Sylvain
Bauza <a class="moz-txt-link-rfc2396E" href="mailto:sbauza@redhat.com"><sbauza@redhat.com></a>:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le mar. 20 juin 2023
à 16:31, Mahendra Paipuri <<a href="mailto:mahendra.paipuri@cnrs.fr" moz-do-not-send="true" class="moz-txt-link-freetext">mahendra.paipuri@cnrs.fr</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Thanks Sylvain for the pointers.</p>
<p>One of the questions we have is: can we create
MIG profiles on the host and then attach each one
or more profile(s) to VMs? This bug [1] reports
that once we attach one profile to a VM, rest of
MIG profiles become unavailable. From what you
have said about using SR-IOV and VFs, I guess this
should be possible.<br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>Correct, what you need is to create first the VFs
using sriov-manage and then you can create the MIG
instances.</div>
<div>Once you create the MIG instances using the
profiles you want, you will see that the related
available_instances for the nvidia mdev type (by
looking at sysfs) will say that you can have a single
vGPU for this profile.</div>
<div>Then, you can use that mdev type with Nova using
nova.conf.</div>
<div><br>
</div>
<div>That being said, while this above is simple, the
below talk was saying more about how to correctly use
the GPU by the host so please wait :-)</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>I think you are talking about "vGPUs with
OpenStack Nova" talk on OpenInfra stage. I will
look into it once the videos will be online. <br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>Indeed.</div>
<div>-S <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>[1] <a href="https://bugs.launchpad.net/nova/+bug/2008883" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.launchpad.net/nova/+bug/2008883</a></p>
<p>Thanks</p>
<p>Regards</p>
<p>Mahendra<br>
</p>
<div>On 20/06/2023 15:47, Sylvain Bauza wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le mar. 20
juin 2023 à 15:12, PAIPURI Mahendra <<a href="mailto:mahendra.paipuri@cnrs.fr" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">mahendra.paipuri@cnrs.fr</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div id="m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p>Hello Ulrich,</p>
<p><br>
</p>
<p>I am relaunching this discussion as I
noticed that you gave a talk about
this topic at OpenInfra Summit in
Vancouver. Is it possible to share the
presentation here? I hope the talks
will be uploaded soon in YouTube. </p>
<p><br>
</p>
<p>We are mainly interested in using MIG
instances in Openstack cloud and I
could not really find a lot of
information by googling. If you could
share your experiences, that would be
great.</p>
<p><br>
</p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Due to scheduling conflicts, I wasn't
able to attend Ulrich's session but his
feedback will be greatly listened to by me.</div>
<div><br>
</div>
<div>FWIW, there was also a short session
about how to enable MIG and play with Nova
at the OpenInfra stage (and that one I was
able to attend it), and it was quite
seamless. What exact information are you
looking for ?</div>
<div>The idea with MIG is that you need to
create SRIOV VFs above the MIG instances
using sriov-manage script provided by nvidia
so that the mediated devices will use those
VFs as the base PCI devices to be used for
Nova.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div id="m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p> </p>
<p>Cheers.</p>
<p><br>
</p>
<p>Regards</p>
<p>Mahendra</p>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_7191695452821608857m_2020284182605405898divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>De :</b> Ulrich
Schwickerath <<a href="mailto:Ulrich.Schwickerath@cern.ch" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">Ulrich.Schwickerath@cern.ch</a>><br>
<b>Envoyé :</b> lundi 16 janvier 2023
11:38:08<br>
<b>À :</b> <a href="mailto:openstack-discuss@lists.openstack.org" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">openstack-discuss@lists.openstack.org</a><br>
<b>Objet :</b> Re: 答复: Experience with
VGPUs</font>
<div> </div>
</div>
<div>
<p>Hi, all,</p>
<p>just to add to the discussion, at
CERN we have recently deployed a bunch
of A100 GPUs in PCI passthrough mode,
and are now looking into improving
their usage by using MIG. From the
NOVA point of view things seem to work
OK, we can schedule VMs requesting a
VGPU, the client starts up and gets a
license token from our NVIDIA license
server (distributing license keys is
our private cloud is relatively easy
in our case). It's a PoC only for the
time being, and we're not ready to put
that forward as we're facing issues
with CUDA on the client (it fails
immediately in memory operations with
'not supported', still investigating
why this happens). <br>
</p>
<p>Once we get that working it would be
nice to be able to have a more fine
grained scheduling so that people can
ask for MIG devices of different size.
The other challenge is how to set
limits on GPU resources. Once the
above issues have been sorted out we
may want to look into cyborg as well
thus we are quite interested in first
experiences with this.</p>
<p>Kind regards, </p>
<p>Ulrich<br>
</p>
<div>On 13.01.23 21:06, Dmitriy
Rabotyagov wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>To have that said, deb/rpm
packages they are providing
doesn't help much, as:
<div dir="auto">* There is no repo
for them, so you need to
download them manually from
enterprise portal</div>
<div dir="auto">* They can't be
upgraded anyway, as driver
version is part of the package
name. And each package conflicts
with any another one. So you
need to explicitly remove old
package and only then install
new one. And yes, you must stop
all VMs before upgrading driver
and no, you can't live migrate
GPU mdev devices due to that now
being implemented in qemu. So
deb/rpm/generic driver doesn't
matter at the end tbh.</div>
<br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">пт, 13 янв.
2023 г., 20:56 Cedric <<a href="mailto:yipikai7@gmail.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">yipikai7@gmail.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="auto"><br>
Ended up with the very same
conclusions than Dimitry
regarding the use of Nvidia
Vgrid for the VGPU use case
with Nova, it works pretty
well but:<br>
<br>
- respecting the licensing
model as operationnal
constraints, note that
guests need to reach a
license server in order to
get a token (could be via
the Nvidia SaaS service or
on-prem)<br>
- drivers for both guest and
hypervisor are not easy to
implement and maintain on
large scale. A year ago,
hypervisors drivers were not
packaged to Debian/Ubuntu,
but builded though a bash
script, thus requiering
additional automatisation
work and careful attention
regarding kernel
update/reboot of Nova
hypervisors.<br>
<br>
Cheers</div>
<br>
<br>
On Fri, Jan 13, 2023 at 4:21
PM Dmitriy Rabotyagov <<a href="mailto:noonedeadpunk@gmail.com" rel="noreferrer noreferrer
noreferrer noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">noonedeadpunk@gmail.com</a>>
wrote:<br>
><br>
> You are saying that, like
Nvidia GRID drivers are
open-sourced while<br>
> in fact they're super far
from being that. In order to
download<br>
> drivers not only for
hypervisors, but also for
guest VMs you need to<br>
> have an account in their
Enterprise Portal. It took me
roughly 6 weeks<br>
> of discussions with
hardware vendors and Nvidia
support to get a<br>
> proper account there. And
that happened only after
applying for their<br>
> Partner Network (NPN).<br>
> That still doesn't solve
the issue of how to provide
drivers to<br>
> guests, except pre-build
a series of images with these
drivers<br>
> pre-installed (we ended
up with making a DIB element
for that [1]).<br>
> Not saying about the need
to distribute license tokens
for guests and<br>
> the whole mess with
compatibility between
hypervisor and guest drivers<br>
> (as guest driver can't be
newer then host one, and HVs
can't be too<br>
> new either).<br>
><br>
> It's not that I'm
protecting AMD, but just
saying that Nvidia is not<br>
> that straightforward
either, and at least on paper
AMD vGPUs look<br>
> easier both for operators
and end-users.<br>
><br>
> [1] <a href="https://github.com/citynetwork/dib-elements/tree/main/nvgrid" rel="noreferrer noreferrer
noreferrer noreferrer
noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">
https://github.com/citynetwork/dib-elements/tree/main/nvgrid</a><br>
><br>
> ><br>
> > As for AMD cards,
AMD stated that some of their
MI series card supports SR-IOV
for vGPUs. However, those
drivers are never open source
or provided closed source to
public, only large cloud
providers are able to get
them. So I don't really
recommend getting AMD cards
for vGPU unless you are able
to get support from them.<br>
> ><br>
><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>