[Openstack-operators] PCI Passthrough issues

Blair Bethwaite blair.bethwaite at gmail.com
Tue Jul 26 07:09:21 UTC 2016


Hi Joe, Jon -

We seem to be good now on both qemu 2.3 and 2.5 with kernel 3.19
(lowest we've tried). Also thanks to Jon we had an easy fix for the
snapshot issues!

Next question - has anyone figured out how to make GPU P2P work? We
haven't tried very hard yet, but with our current setup we're telling
Nova to pass through the GK210GL "3D controller" and that results in
the guest seeing individual GPUs attached to a virtualised PCI bus,
even when e.g. passing through two K80s on the same board. Next
obvious step is to try passing through the on-board PLX PCI bridge,
but wondering whether anyone else has been down this path yet?

Cheers,

On 20 July 2016 at 12:57, Blair Bethwaite <blair.bethwaite at gmail.com> wrote:
> Thanks for the confirmation Joe!
>
> On 20 July 2016 at 12:19, Joe Topjian <joe at topjian.net> wrote:
>> Hi Blair,
>>
>> We only updated qemu. We're running the version of libvirt from the Kilo
>> cloudarchive.
>>
>> We've been in production with our K80s for around two weeks now and have had
>> several users report success.
>>
>> Thanks,
>> Joe
>>
>> On Tue, Jul 19, 2016 at 5:06 PM, Blair Bethwaite <blair.bethwaite at gmail.com>
>> wrote:
>>>
>>> Hilariously (or not!) we finally hit the same issue last week once
>>> folks actually started trying to do something (other than build and
>>> load drivers) with the K80s we're passing through. This
>>>
>>> https://devtalk.nvidia.com/default/topic/850833/pci-passthrough-kvm-for-cuda-usage/
>>> is the best discussion of the issue I've found so far, haven't tracked
>>> down an actual bug yet though. I wonder whether it has something to do
>>> with the memory size of the device, as we've been happy for a long
>>> time with other NVIDIA GPUs (GRID K1, K2, M2070, ...).
>>>
>>> Jon, when you grabbed Mitaka Qemu, did you also update libvirt? We're
>>> just working through this and have tried upgrading both but are
>>> hitting some issues with Nova and Neutron on the compute nodes,
>>> thinking it may libvirt related but debug isn't helping much yet.
>>>
>>> Cheers,
>>>
>>> On 8 July 2016 at 00:54, Jonathan Proulx <jon at csail.mit.edu> wrote:
>>> > On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote:
>>> > :Jon,
>>> > :
>>> > :Awesome, thanks for sharing. We've just run into an issue with SRIOV
>>> > :VF passthrough that sounds like it might be the same problem (device
>>> > :disappearing after a reboot), but haven't yet investigated deeply -
>>> > :this will help with somewhere to start!
>>> >
>>> > :By the way, the nouveau mention was because we had missed it on some
>>> > :K80 hypervisors recently and seen passthrough apparently work, but
>>> > :then the NVIDIA drivers would not build in the guest as they claimed
>>> > :they could not find a supported device (despite the GPU being visible
>>> > :on the PCI bus).
>>> >
>>> > Definitely sage advice!
>>> >
>>> > :I have also heard passing mention of requiring qemu
>>> > :2.3+ but don't have any specific details of the related issue.
>>> >
>>> > I didn't do a bisection but with qemu 2.2 (from ubuntu cloudarchive
>>> > kilo) I was sad and with 2.5 (from ubuntu cloudarchive mitaka but
>>> > installed on a kilo hypervisor) I am working.
>>> >
>>> > Thanks,
>>> > -Jon
>>> >
>>> >
>>> > :Cheers,
>>> > :
>>> > :On 7 July 2016 at 08:13, Jonathan Proulx <jon at csail.mit.edu> wrote:
>>> > :> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
>>> > :> :
>>> > :> :I do have an odd remaining issue where I can run cuda jobs in the vm
>>> > :> :but snapshots fail and after pause (for snapshotting) the pci device
>>> > :> :can't be reattached (which is where i think it deletes the snapshot
>>> > :> :it took).  Got same issue with 3.16 and 4.4 kernels.
>>> > :> :
>>> > :> :Not very well categorized yet, but I'm hoping it's because the VM I
>>> > :> :was hacking on had it's libvirt.xml written out with the older qemu
>>> > :> :maybe?  It had been through a couple reboots of the physical system
>>> > :> :though.
>>> > :> :
>>> > :> :Currently building a fresh instance and bashing more keys...
>>> > :>
>>> > :> After an ugly bout of bashing I've solve my failing snapshot issue
>>> > :> which I'll post here in hopes of saving someonelse
>>> > :>
>>> > :> Short version:
>>> > :>
>>> > :> add "/dev/vfio/vfio rw," to
>>> > /etc/apparmor.d/abstractions/libvirt-qemu
>>> > :> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf
>>> > :>
>>> > :> Longer version:
>>> > :>
>>> > :> What was happening.
>>> > :>
>>> > :> * send snapshot request
>>> > :> * instance pauses while snapshot is pending
>>> > :> * instance attempt to resume
>>> > :> * fails to reattach pci device
>>> > :>   * nova-compute.log
>>> > :>     Exception during message handling: internal error: unable to
>>> > execute QEMU command 'device_add': Device initialization failedcompute.log
>>> > :>
>>> > :>   * qemu/<id>.log
>>> > :>     vfio: failed to open /dev/vfio/vfio: Permission denied
>>> > :>     vfio: failed to setup container for group 48
>>> > :>     vfio: failed to get group 48
>>> > :> * snapshot disappears
>>> > :> * instance resumes but without passed through device (hard reboot
>>> > :>     reattaches)
>>> > :>
>>> > :> seeing permsission denied I though would be an easy fix but:
>>> > :>
>>> > :> # ls -l /dev/vfio/vfio
>>> > :> crw-rw-rw- 1 root root 10, 196 Jul  6 14:05 /dev/vfio/vfio
>>> > :>
>>> > :> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
>>> > :> rw," to  /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
>>> > :> hypervisor and trying again which gets me a different libvirt error
>>> > :> set:
>>> > :>
>>> > :> VFIO_MAP_DMA: -12
>>> > :> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12
>>> > (Cannot allocate memory)
>>> > :>
>>> > :> kern.log (and thus dmesg) showing:
>>> > :> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
>>> > :>
>>> > :> Getting rid of this one required inserting 'ulimit -l unlimited' into
>>> > :> /etc/init/libvirt-bin.conf in the 'script' section:
>>> > :>
>>> > :> <previous bits excluded>
>>> > :> script
>>> > :>         [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
>>> > :>         ulimit -l unlimited
>>> > :>         exec /usr/sbin/libvirtd $libvirtd_opts
>>> > :> end script
>>> > :>
>>> > :>
>>> > :> -Jon
>>> > :>
>>> > :> _______________________________________________
>>> > :> OpenStack-operators mailing list
>>> > :> OpenStack-operators at lists.openstack.org
>>> > :>
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> > :
>>> > :
>>> > :
>>> > :--
>>> > :Cheers,
>>> > :~Blairo
>>> >
>>> > --
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> ~Blairo
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
>
>
> --
> Cheers,
> ~Blairo



-- 
Cheers,
~Blairo



More information about the OpenStack-operators mailing list