<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mar. 20 juin 2023 à 16:31, Mahendra Paipuri <<a href="mailto:mahendra.paipuri@cnrs.fr">mahendra.paipuri@cnrs.fr</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>Thanks Sylvain for the pointers.</p>
    <p>One of the questions we have is: can we create MIG profiles on
      the host and then attach each one or more profile(s) to VMs? This
      bug [1] reports that once we attach one profile to a VM, rest of
      MIG profiles become unavailable. From what you have said about
      using SR-IOV and VFs, I guess this should be possible.<br></p></div></blockquote><div><br></div><div>Correct, what you need is to create first the VFs using sriov-manage and then you can create the MIG instances.</div><div>Once you create the MIG instances using the profiles you want, you will see that the related available_instances for the nvidia mdev type (by looking at sysfs) will say that you can have a single vGPU for this profile.</div><div>Then, you can use that mdev type with Nova using nova.conf.</div><div><br></div><div>That being said, while this above is simple, the below talk was saying more about how to correctly use the GPU by the host so please wait :-)</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p>I think you are talking about "vGPUs with OpenStack Nova" talk on
      OpenInfra stage. I will look into it once the videos will be
      online. <br></p></div></blockquote><div><br></div><div>Indeed.</div><div>-S <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p>[1] <a href="https://bugs.launchpad.net/nova/+bug/2008883" target="_blank">https://bugs.launchpad.net/nova/+bug/2008883</a></p>
    <p>Thanks</p>
    <p>Regards</p>
    <p>Mahendra<br>
    </p>
    <div>On 20/06/2023 15:47, Sylvain Bauza
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">Le mar. 20 juin 2023
            à 15:12, PAIPURI Mahendra <<a href="mailto:mahendra.paipuri@cnrs.fr" target="_blank">mahendra.paipuri@cnrs.fr</a>>
            a écrit :<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <div id="m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
                <p>Hello Ulrich,</p>
                <p><br>
                </p>
                <p>I am relaunching this discussion as I noticed that
                  you gave a talk about this topic at OpenInfra Summit
                  in Vancouver. Is it possible to share the presentation
                  here? I hope the talks will be uploaded soon in
                  YouTube. </p>
                <p><br>
                </p>
                <p>We are mainly interested in using MIG instances in
                  Openstack cloud and I could not really find a lot of
                  information by googling. If you could share your
                  experiences, that would be great.</p>
                <p><br>
                </p>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Due to scheduling conflicts, I wasn't able to attend
            Ulrich's session but his feedback will be greatly listened
            to by me.</div>
          <div><br>
          </div>
          <div>FWIW, there was also a short session about how to enable
            MIG and play with Nova at the OpenInfra stage (and that one
            I was able to attend it), and it was quite seamless. What
            exact information are you looking for ?</div>
          <div>The idea with MIG is that you need to create SRIOV VFs
            above the MIG instances using sriov-manage script provided
            by nvidia so that the mediated devices will use those VFs as
            the base PCI devices to be used for Nova.</div>
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <div id="m_7191695452821608857m_2020284182605405898divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
                <p>
                </p>
                <p>Cheers.</p>
                <p><br>
                </p>
                <p>Regards</p>
                <p>Mahendra</p>
              </div>
              <hr style="display:inline-block;width:98%">
              <div id="m_7191695452821608857m_2020284182605405898divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>De :</b> Ulrich Schwickerath <<a href="mailto:Ulrich.Schwickerath@cern.ch" target="_blank">Ulrich.Schwickerath@cern.ch</a>><br>
                  <b>Envoyé :</b> lundi 16 janvier 2023 11:38:08<br>
                  <b>À :</b> <a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a><br>
                  <b>Objet :</b> Re: 答复: Experience with VGPUs</font>
                <div> </div>
              </div>
              <div>
                <p>Hi, all,</p>
                <p>just to add to the discussion, at CERN we have
                  recently deployed a bunch of A100 GPUs in PCI
                  passthrough mode, and are now looking into improving
                  their usage by using MIG. From the NOVA point of view
                  things seem to work OK, we can schedule VMs requesting
                  a VGPU, the client starts up and gets a license token
                  from our NVIDIA license server (distributing license
                  keys is our private cloud is relatively easy in our
                  case). It's a PoC only for the time being, and we're
                  not ready to put that forward as we're facing issues
                  with CUDA on the client (it fails immediately in
                  memory operations with 'not supported', still
                  investigating why this happens).
                  <br>
                </p>
                <p>Once we get that working it would be nice to be able
                  to have a more fine grained scheduling so that people
                  can ask for MIG devices of different size. The other
                  challenge is how to set limits on GPU resources. Once
                  the above issues have been sorted out we may want to
                  look into cyborg as well thus we are quite interested
                  in first experiences with this.</p>
                <p>Kind regards, </p>
                <p>Ulrich<br>
                </p>
                <div>On 13.01.23 21:06, Dmitriy Rabotyagov wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="auto">
                    <div>To have that said, deb/rpm packages they are
                      providing doesn't help much, as:
                      <div dir="auto">* There is no repo for them, so
                        you need to download them manually from
                        enterprise portal</div>
                      <div dir="auto">* They can't be upgraded anyway,
                        as driver version is part of the package name.
                        And each package conflicts with any another one.
                        So you need to explicitly remove old package and
                        only then install new one. And yes, you must
                        stop all VMs before upgrading driver and no, you
                        can't live migrate GPU mdev devices due to that
                        now being implemented in qemu. So
                        deb/rpm/generic driver doesn't matter at the end
                        tbh.</div>
                      <br>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">пт, 13 янв.
                          2023 г., 20:56 Cedric <<a href="mailto:yipikai7@gmail.com" target="_blank">yipikai7@gmail.com</a>>:<br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div dir="auto"><br>
                            Ended up with the very same conclusions than
                            Dimitry regarding the use of Nvidia Vgrid
                            for the VGPU use case with Nova, it works
                            pretty well but:<br>
                            <br>
                            - respecting the licensing model as
                            operationnal constraints, note that guests
                            need to reach a license server in order to
                            get a token (could be via the Nvidia SaaS
                            service or on-prem)<br>
                            - drivers for both guest and hypervisor are
                            not easy to implement and maintain on large
                            scale. A year ago, hypervisors drivers were
                            not packaged to Debian/Ubuntu, but builded
                            though a bash script, thus requiering
                            additional automatisation work and careful
                            attention regarding kernel update/reboot of
                            Nova hypervisors.<br>
                            <br>
                            Cheers</div>
                          <br>
                          <br>
                          On Fri, Jan 13, 2023 at 4:21 PM Dmitriy
                          Rabotyagov <<a href="mailto:noonedeadpunk@gmail.com" rel="noreferrer noreferrer noreferrer
                            noreferrer" target="_blank">noonedeadpunk@gmail.com</a>>
                          wrote:<br>
                          ><br>
                          > You are saying that, like Nvidia GRID
                          drivers are open-sourced while<br>
                          > in fact they're super far from being
                          that. In order to download<br>
                          > drivers not only for hypervisors, but
                          also for guest VMs you need to<br>
                          > have an account in their Enterprise
                          Portal. It took me roughly 6 weeks<br>
                          > of discussions with hardware vendors and
                          Nvidia support to get a<br>
                          > proper account there. And that happened
                          only after applying for their<br>
                          > Partner Network (NPN).<br>
                          > That still doesn't solve the issue of how
                          to provide drivers to<br>
                          > guests, except pre-build a series of
                          images with these drivers<br>
                          > pre-installed (we ended up with making a
                          DIB element for that [1]).<br>
                          > Not saying about the need to distribute
                          license tokens for guests and<br>
                          > the whole mess with compatibility between
                          hypervisor and guest drivers<br>
                          > (as guest driver can't be newer then host
                          one, and HVs can't be too<br>
                          > new either).<br>
                          ><br>
                          > It's not that I'm protecting AMD, but
                          just saying that Nvidia is not<br>
                          > that straightforward either, and at least
                          on paper AMD vGPUs look<br>
                          > easier both for operators and end-users.<br>
                          ><br>
                          > [1] <a href="https://github.com/citynetwork/dib-elements/tree/main/nvgrid" rel="noreferrer noreferrer noreferrer
                            noreferrer noreferrer" target="_blank">
https://github.com/citynetwork/dib-elements/tree/main/nvgrid</a><br>
                          ><br>
                          > ><br>
                          > > As for AMD cards, AMD stated that
                          some of their MI series card supports SR-IOV
                          for vGPUs. However, those drivers are never
                          open source or provided closed source to
                          public, only large cloud providers are able to
                          get them. So I don't really recommend getting
                          AMD cards for vGPU unless you are able to get
                          support from them.<br>
                          > ><br>
                          ><br>
                        </blockquote>
                      </div>
                    </div>
                  </div>
                </blockquote>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>