[Openstack-operators] PCI Passthrough issues

Jonathan Proulx jon at csail.mit.edu
Wed Jul 6 22:13:50 UTC 2016


On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
:
:I do have an odd remaining issue where I can run cuda jobs in the vm
:but snapshots fail and after pause (for snapshotting) the pci device
:can't be reattached (which is where i think it deletes the snapshot
:it took).  Got same issue with 3.16 and 4.4 kernels.
:
:Not very well categorized yet, but I'm hoping it's because the VM I
:was hacking on had it's libvirt.xml written out with the older qemu
:maybe?  It had been through a couple reboots of the physical system
:though.
:
:Currently building a fresh instance and bashing more keys...

After an ugly bout of bashing I've solve my failing snapshot issue
which I'll post here in hopes of saving someonelse 

Short version:

add "/dev/vfio/vfio rw," to  /etc/apparmor.d/abstractions/libvirt-qemu
add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf

Longer version:

What was happening.

* send snapshot request
* instance pauses while snapshot is pending
* instance attempt to resume
* fails to reattach pci device
  * nova-compute.log
    Exception during message handling: internal error: unable to execute QEMU command 'device_add': Device initialization failedcompute.log

  * qemu/<id>.log
    vfio: failed to open /dev/vfio/vfio: Permission denied
    vfio: failed to setup container for group 48
    vfio: failed to get group 48
* snapshot disappears
* instance resumes but without passed through device (hard reboot
    reattaches)

seeing permsission denied I though would be an easy fix but:

# ls -l /dev/vfio/vfio
crw-rw-rw- 1 root root 10, 196 Jul  6 14:05 /dev/vfio/vfio

so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
rw," to  /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
hypervisor and trying again which gets me a different libvirt error
set:

VFIO_MAP_DMA: -12
vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12 (Cannot allocate memory)

kern.log (and thus dmesg) showing:
vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded

Getting rid of this one required inserting 'ulimit -l unlimited' into
/etc/init/libvirt-bin.conf in the 'script' section:

<previous bits excluded>
script
        [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
        ulimit -l unlimited
        exec /usr/sbin/libvirtd $libvirtd_opts
end script


-Jon



More information about the OpenStack-operators mailing list