[Openstack-operators] PCI Passthrough issues

Jonathan Proulx jon at csail.mit.edu
Thu Jul 7 14:54:00 UTC 2016


On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote:
:Jon,
:
:Awesome, thanks for sharing. We've just run into an issue with SRIOV
:VF passthrough that sounds like it might be the same problem (device
:disappearing after a reboot), but haven't yet investigated deeply -
:this will help with somewhere to start!

:By the way, the nouveau mention was because we had missed it on some
:K80 hypervisors recently and seen passthrough apparently work, but
:then the NVIDIA drivers would not build in the guest as they claimed
:they could not find a supported device (despite the GPU being visible
:on the PCI bus). 

Definitely sage advice! 

:I have also heard passing mention of requiring qemu
:2.3+ but don't have any specific details of the related issue.

I didn't do a bisection but with qemu 2.2 (from ubuntu cloudarchive
kilo) I was sad and with 2.5 (from ubuntu cloudarchive mitaka but
installed on a kilo hypervisor) I am working.

Thanks,
-Jon


:Cheers,
:
:On 7 July 2016 at 08:13, Jonathan Proulx <jon at csail.mit.edu> wrote:
:> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
:> :
:> :I do have an odd remaining issue where I can run cuda jobs in the vm
:> :but snapshots fail and after pause (for snapshotting) the pci device
:> :can't be reattached (which is where i think it deletes the snapshot
:> :it took).  Got same issue with 3.16 and 4.4 kernels.
:> :
:> :Not very well categorized yet, but I'm hoping it's because the VM I
:> :was hacking on had it's libvirt.xml written out with the older qemu
:> :maybe?  It had been through a couple reboots of the physical system
:> :though.
:> :
:> :Currently building a fresh instance and bashing more keys...
:>
:> After an ugly bout of bashing I've solve my failing snapshot issue
:> which I'll post here in hopes of saving someonelse
:>
:> Short version:
:>
:> add "/dev/vfio/vfio rw," to  /etc/apparmor.d/abstractions/libvirt-qemu
:> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf
:>
:> Longer version:
:>
:> What was happening.
:>
:> * send snapshot request
:> * instance pauses while snapshot is pending
:> * instance attempt to resume
:> * fails to reattach pci device
:>   * nova-compute.log
:>     Exception during message handling: internal error: unable to execute QEMU command 'device_add': Device initialization failedcompute.log
:>
:>   * qemu/<id>.log
:>     vfio: failed to open /dev/vfio/vfio: Permission denied
:>     vfio: failed to setup container for group 48
:>     vfio: failed to get group 48
:> * snapshot disappears
:> * instance resumes but without passed through device (hard reboot
:>     reattaches)
:>
:> seeing permsission denied I though would be an easy fix but:
:>
:> # ls -l /dev/vfio/vfio
:> crw-rw-rw- 1 root root 10, 196 Jul  6 14:05 /dev/vfio/vfio
:>
:> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
:> rw," to  /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
:> hypervisor and trying again which gets me a different libvirt error
:> set:
:>
:> VFIO_MAP_DMA: -12
:> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12 (Cannot allocate memory)
:>
:> kern.log (and thus dmesg) showing:
:> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
:>
:> Getting rid of this one required inserting 'ulimit -l unlimited' into
:> /etc/init/libvirt-bin.conf in the 'script' section:
:>
:> <previous bits excluded>
:> script
:>         [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
:>         ulimit -l unlimited
:>         exec /usr/sbin/libvirtd $libvirtd_opts
:> end script
:>
:>
:> -Jon
:>
:> _______________________________________________
:> OpenStack-operators mailing list
:> OpenStack-operators at lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:
:
:
:-- 
:Cheers,
:~Blairo

-- 



More information about the OpenStack-operators mailing list