[Openstack-operators] PCI Passthrough issues
Blair Bethwaite
blair.bethwaite at gmail.com
Thu Jul 7 01:13:29 UTC 2016
Jon,
Awesome, thanks for sharing. We've just run into an issue with SRIOV
VF passthrough that sounds like it might be the same problem (device
disappearing after a reboot), but haven't yet investigated deeply -
this will help with somewhere to start!
By the way, the nouveau mention was because we had missed it on some
K80 hypervisors recently and seen passthrough apparently work, but
then the NVIDIA drivers would not build in the guest as they claimed
they could not find a supported device (despite the GPU being visible
on the PCI bus). I have also heard passing mention of requiring qemu
2.3+ but don't have any specific details of the related issue.
Cheers,
On 7 July 2016 at 08:13, Jonathan Proulx <jon at csail.mit.edu> wrote:
> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
> :
> :I do have an odd remaining issue where I can run cuda jobs in the vm
> :but snapshots fail and after pause (for snapshotting) the pci device
> :can't be reattached (which is where i think it deletes the snapshot
> :it took). Got same issue with 3.16 and 4.4 kernels.
> :
> :Not very well categorized yet, but I'm hoping it's because the VM I
> :was hacking on had it's libvirt.xml written out with the older qemu
> :maybe? It had been through a couple reboots of the physical system
> :though.
> :
> :Currently building a fresh instance and bashing more keys...
>
> After an ugly bout of bashing I've solve my failing snapshot issue
> which I'll post here in hopes of saving someonelse
>
> Short version:
>
> add "/dev/vfio/vfio rw," to /etc/apparmor.d/abstractions/libvirt-qemu
> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf
>
> Longer version:
>
> What was happening.
>
> * send snapshot request
> * instance pauses while snapshot is pending
> * instance attempt to resume
> * fails to reattach pci device
> * nova-compute.log
> Exception during message handling: internal error: unable to execute QEMU command 'device_add': Device initialization failedcompute.log
>
> * qemu/<id>.log
> vfio: failed to open /dev/vfio/vfio: Permission denied
> vfio: failed to setup container for group 48
> vfio: failed to get group 48
> * snapshot disappears
> * instance resumes but without passed through device (hard reboot
> reattaches)
>
> seeing permsission denied I though would be an easy fix but:
>
> # ls -l /dev/vfio/vfio
> crw-rw-rw- 1 root root 10, 196 Jul 6 14:05 /dev/vfio/vfio
>
> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
> rw," to /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
> hypervisor and trying again which gets me a different libvirt error
> set:
>
> VFIO_MAP_DMA: -12
> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12 (Cannot allocate memory)
>
> kern.log (and thus dmesg) showing:
> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
>
> Getting rid of this one required inserting 'ulimit -l unlimited' into
> /etc/init/libvirt-bin.conf in the 'script' section:
>
> <previous bits excluded>
> script
> [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
> ulimit -l unlimited
> exec /usr/sbin/libvirtd $libvirtd_opts
> end script
>
>
> -Jon
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
--
Cheers,
~Blairo
More information about the OpenStack-operators
mailing list