[nova][CI] GPUs in the gate
smooney at redhat.com
Mon May 13 18:09:50 UTC 2019
On Tue, 2019-05-07 at 19:56 -0400, Clark Boylan wrote:
> On Tue, May 7, 2019, at 10:48 AM, Artom Lifshitz wrote:
> > Hey all,
> > Following up on the CI session during the PTG , I wanted to get the
> > ball rolling on getting GPU hardware into the gate somehow. Initially
> > the plan was to do it through OpenLab and by convincing NVIDIA to
> > donate the cards, but after a conversation with Sean McGinnis it
> > appears Infra have access to machines with GPUs.
> > From Nova's POV, the requirements are:
> > * The machines with GPUs should probably be Ironic baremetal nodes and
> > not VMs [*].
> > * The GPUs need to support virtualization. It's hard to get a
> > comprehensive list of GPUs that do, but Nova's own docs  mention
> > two: Intel cards with GVT  and NVIDIA GRID .
> > So I think at this point the question is whether Infra can support
> > those reqs. If yes, we can start concrete steps towards getting those
> > machines used by a CI job. If not, we'll fall back to OpenLab and try
> > to get them hardware.
> What we currently have access to is a small amount of Vexxhost's GPU instances (so mnaser can further clarify my
> comments here). I believe these are VMs with dedicated nvidia gpus that are passed through. I don't think they support
> the vgpu feature.
this is correct i asked mnaser about this in the past which is why he made the gpu nodeset available initally but
after checking with sylvain and confiming the gpu model available via vexxhost we determined they could not be used
to test vgpu support.
> It might help to describe the use case you are trying to meet rather than jumping ahead to requirements/solutions.
> That way maybe we can work with Vexxhost to better support what you need (or come up with some other solutions). For
> those of us that don't know all of the particulars it really does help if you can go from use case to requirements.
effectly we just want to test the mdev based vgpu support in the libvirt driver.
nvidia locks down support for vGPU to there tesla and quadro cards and requires a license server to be running to
enabled the use fo teh grid driver.
as a resutl to be able to test this feaute in the upstream gate we would need a gpu that is on the supported list of
the nvida grid driver and a license server(could just use the trial licenses) so that we can use the vgpu feature.
As vfio medatione devices are an extention of the sr-iov framework bulit on top fo the vfio stack the only simple
way to these this would be via a baremetal host as we do not have a way to do a double passthough in a way that
preserves sriov fucntionality.( the way i descibed in my last email is just a theory and openstack is missing support
for vIOMMU support in anycase even if it did work)
> > [*] Could we do double-passthrough? Could the card be passed through
> > to the L1 guest via the PCI passthrough mechanism, and then into the
> > L2 guest via the mdev mechanism?
> >  https://etherpad.openstack.org/p/nova-ptg-train-ci
> >  https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html
> >  https://01.org/igvt-g
> >  https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf
More information about the openstack-discuss