[nova][CI] GPUs in the gate
Artom Lifshitz
alifshit at redhat.com
Wed May 8 12:46:56 UTC 2019
On Tue, May 7, 2019 at 8:00 PM Clark Boylan <cboylan at sapwetik.org> wrote:
>
> On Tue, May 7, 2019, at 10:48 AM, Artom Lifshitz wrote:
> > Hey all,
> >
> > Following up on the CI session during the PTG [1], I wanted to get the
> > ball rolling on getting GPU hardware into the gate somehow. Initially
> > the plan was to do it through OpenLab and by convincing NVIDIA to
> > donate the cards, but after a conversation with Sean McGinnis it
> > appears Infra have access to machines with GPUs.
> >
> > From Nova's POV, the requirements are:
> > * The machines with GPUs should probably be Ironic baremetal nodes and
> > not VMs [*].
> > * The GPUs need to support virtualization. It's hard to get a
> > comprehensive list of GPUs that do, but Nova's own docs [2] mention
> > two: Intel cards with GVT [3] and NVIDIA GRID [4].
> >
> > So I think at this point the question is whether Infra can support
> > those reqs. If yes, we can start concrete steps towards getting those
> > machines used by a CI job. If not, we'll fall back to OpenLab and try
> > to get them hardware.
>
> What we currently have access to is a small amount of Vexxhost's GPU instances (so mnaser can further clarify my comments here). I believe these are VMs with dedicated nvidia gpus that are passed through. I don't think they support the vgpu feature.
>
> It might help to describe the use case you are trying to meet rather than jumping ahead to requirements/solutions. That way maybe we can work with Vexxhost to better support what you need (or come up with some other solutions). For those of us that don't know all of the particulars it really does help if you can go from use case to requirements.
Right, apologies, I got ahead of myself.
The use case is CI coverage for Nova's VGPU feature. This feature can
be summarized (and oversimplified) as "SRIOV for GPUs": a single
physical GPU can be split into multiple virtual GPUs (via libvirt's
mdev support [5]), each one being assigned to a different guest. We
have functional tests in-tree, but no tests with real hardware. So
we're looking for a way to get real hardware in the gate.
I hope that clarifies things. Let me know if there are further questions.
[5] https://libvirt.org/drvnodedev.html#MDEVCap
>
> >
> > [*] Could we do double-passthrough? Could the card be passed through
> > to the L1 guest via the PCI passthrough mechanism, and then into the
> > L2 guest via the mdev mechanism?
> >
> > [1] https://etherpad.openstack.org/p/nova-ptg-train-ci
> > [2] https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html
> > [3] https://01.org/igvt-g
> > [4] https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf
More information about the openstack-discuss
mailing list