[Openstack-operators] PCI Passthrough issues
Jonathan Proulx
jon at csail.mit.edu
Thu Jul 7 14:54:00 UTC 2016
On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote:
:Jon,
:
:Awesome, thanks for sharing. We've just run into an issue with SRIOV
:VF passthrough that sounds like it might be the same problem (device
:disappearing after a reboot), but haven't yet investigated deeply -
:this will help with somewhere to start!
:By the way, the nouveau mention was because we had missed it on some
:K80 hypervisors recently and seen passthrough apparently work, but
:then the NVIDIA drivers would not build in the guest as they claimed
:they could not find a supported device (despite the GPU being visible
:on the PCI bus).
Definitely sage advice!
:I have also heard passing mention of requiring qemu
:2.3+ but don't have any specific details of the related issue.
I didn't do a bisection but with qemu 2.2 (from ubuntu cloudarchive
kilo) I was sad and with 2.5 (from ubuntu cloudarchive mitaka but
installed on a kilo hypervisor) I am working.
Thanks,
-Jon
:Cheers,
:
:On 7 July 2016 at 08:13, Jonathan Proulx <jon at csail.mit.edu> wrote:
:> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
:> :
:> :I do have an odd remaining issue where I can run cuda jobs in the vm
:> :but snapshots fail and after pause (for snapshotting) the pci device
:> :can't be reattached (which is where i think it deletes the snapshot
:> :it took). Got same issue with 3.16 and 4.4 kernels.
:> :
:> :Not very well categorized yet, but I'm hoping it's because the VM I
:> :was hacking on had it's libvirt.xml written out with the older qemu
:> :maybe? It had been through a couple reboots of the physical system
:> :though.
:> :
:> :Currently building a fresh instance and bashing more keys...
:>
:> After an ugly bout of bashing I've solve my failing snapshot issue
:> which I'll post here in hopes of saving someonelse
:>
:> Short version:
:>
:> add "/dev/vfio/vfio rw," to /etc/apparmor.d/abstractions/libvirt-qemu
:> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf
:>
:> Longer version:
:>
:> What was happening.
:>
:> * send snapshot request
:> * instance pauses while snapshot is pending
:> * instance attempt to resume
:> * fails to reattach pci device
:> * nova-compute.log
:> Exception during message handling: internal error: unable to execute QEMU command 'device_add': Device initialization failedcompute.log
:>
:> * qemu/<id>.log
:> vfio: failed to open /dev/vfio/vfio: Permission denied
:> vfio: failed to setup container for group 48
:> vfio: failed to get group 48
:> * snapshot disappears
:> * instance resumes but without passed through device (hard reboot
:> reattaches)
:>
:> seeing permsission denied I though would be an easy fix but:
:>
:> # ls -l /dev/vfio/vfio
:> crw-rw-rw- 1 root root 10, 196 Jul 6 14:05 /dev/vfio/vfio
:>
:> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
:> rw," to /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
:> hypervisor and trying again which gets me a different libvirt error
:> set:
:>
:> VFIO_MAP_DMA: -12
:> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12 (Cannot allocate memory)
:>
:> kern.log (and thus dmesg) showing:
:> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
:>
:> Getting rid of this one required inserting 'ulimit -l unlimited' into
:> /etc/init/libvirt-bin.conf in the 'script' section:
:>
:> <previous bits excluded>
:> script
:> [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
:> ulimit -l unlimited
:> exec /usr/sbin/libvirtd $libvirtd_opts
:> end script
:>
:>
:> -Jon
:>
:> _______________________________________________
:> OpenStack-operators mailing list
:> OpenStack-operators at lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:
:
:
:--
:Cheers,
:~Blairo
--
More information about the OpenStack-operators
mailing list