[Openstack-operators] PCI Passthrough issues

Stig Telfer stig.openstack at telfer.org
Thu Jul 28 22:32:35 UTC 2016


Just out of interest, I saw this talk from DK Panda a few months ago which covers MPI developments, including for GPU-Direct and for running in virtualised environments:

https://youtu.be/AsFakPJSplo

Do you know if this means there is a version of MVAPICH2 that supports GPU-Direct optimised for a virtualised environment, or are they entirely disjoint efforts?

Might be tricky - I am not sure how virtual PCI BARs map to the hypervisor’s physical PCI BARs.  If the physical PCI ranges are hidden from the VM it may not be possible to initiate a peer-to-peer transfer.  Does anyone know if it can be done?

Best wishes,
Stig


> On 26 Jul 2016, at 08:09, Blair Bethwaite <blair.bethwaite at gmail.com> wrote:
> 
> Hi Joe, Jon -
> 
> We seem to be good now on both qemu 2.3 and 2.5 with kernel 3.19
> (lowest we've tried). Also thanks to Jon we had an easy fix for the
> snapshot issues!
> 
> Next question - has anyone figured out how to make GPU P2P work? We
> haven't tried very hard yet, but with our current setup we're telling
> Nova to pass through the GK210GL "3D controller" and that results in
> the guest seeing individual GPUs attached to a virtualised PCI bus,
> even when e.g. passing through two K80s on the same board. Next
> obvious step is to try passing through the on-board PLX PCI bridge,
> but wondering whether anyone else has been down this path yet?
> 
> Cheers,
> 
> On 20 July 2016 at 12:57, Blair Bethwaite <blair.bethwaite at gmail.com> wrote:
>> Thanks for the confirmation Joe!
>> 
>> On 20 July 2016 at 12:19, Joe Topjian <joe at topjian.net> wrote:
>>> Hi Blair,
>>> 
>>> We only updated qemu. We're running the version of libvirt from the Kilo
>>> cloudarchive.
>>> 
>>> We've been in production with our K80s for around two weeks now and have had
>>> several users report success.
>>> 
>>> Thanks,
>>> Joe
>>> 
>>> On Tue, Jul 19, 2016 at 5:06 PM, Blair Bethwaite <blair.bethwaite at gmail.com>
>>> wrote:
>>>> 
>>>> Hilariously (or not!) we finally hit the same issue last week once
>>>> folks actually started trying to do something (other than build and
>>>> load drivers) with the K80s we're passing through. This
>>>> 
>>>> https://devtalk.nvidia.com/default/topic/850833/pci-passthrough-kvm-for-cuda-usage/
>>>> is the best discussion of the issue I've found so far, haven't tracked
>>>> down an actual bug yet though. I wonder whether it has something to do
>>>> with the memory size of the device, as we've been happy for a long
>>>> time with other NVIDIA GPUs (GRID K1, K2, M2070, ...).
>>>> 
>>>> Jon, when you grabbed Mitaka Qemu, did you also update libvirt? We're
>>>> just working through this and have tried upgrading both but are
>>>> hitting some issues with Nova and Neutron on the compute nodes,
>>>> thinking it may libvirt related but debug isn't helping much yet.
>>>> 
>>>> Cheers,
>>>> 
>>>> On 8 July 2016 at 00:54, Jonathan Proulx <jon at csail.mit.edu> wrote:
>>>>> On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote:
>>>>> :Jon,
>>>>> :
>>>>> :Awesome, thanks for sharing. We've just run into an issue with SRIOV
>>>>> :VF passthrough that sounds like it might be the same problem (device
>>>>> :disappearing after a reboot), but haven't yet investigated deeply -
>>>>> :this will help with somewhere to start!
>>>>> 
>>>>> :By the way, the nouveau mention was because we had missed it on some
>>>>> :K80 hypervisors recently and seen passthrough apparently work, but
>>>>> :then the NVIDIA drivers would not build in the guest as they claimed
>>>>> :they could not find a supported device (despite the GPU being visible
>>>>> :on the PCI bus).
>>>>> 
>>>>> Definitely sage advice!
>>>>> 
>>>>> :I have also heard passing mention of requiring qemu
>>>>> :2.3+ but don't have any specific details of the related issue.
>>>>> 
>>>>> I didn't do a bisection but with qemu 2.2 (from ubuntu cloudarchive
>>>>> kilo) I was sad and with 2.5 (from ubuntu cloudarchive mitaka but
>>>>> installed on a kilo hypervisor) I am working.
>>>>> 
>>>>> Thanks,
>>>>> -Jon
>>>>> 
>>>>> 
>>>>> :Cheers,
>>>>> :
>>>>> :On 7 July 2016 at 08:13, Jonathan Proulx <jon at csail.mit.edu> wrote:
>>>>> :> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
>>>>> :> :
>>>>> :> :I do have an odd remaining issue where I can run cuda jobs in the vm
>>>>> :> :but snapshots fail and after pause (for snapshotting) the pci device
>>>>> :> :can't be reattached (which is where i think it deletes the snapshot
>>>>> :> :it took).  Got same issue with 3.16 and 4.4 kernels.
>>>>> :> :
>>>>> :> :Not very well categorized yet, but I'm hoping it's because the VM I
>>>>> :> :was hacking on had it's libvirt.xml written out with the older qemu
>>>>> :> :maybe?  It had been through a couple reboots of the physical system
>>>>> :> :though.
>>>>> :> :
>>>>> :> :Currently building a fresh instance and bashing more keys...
>>>>> :>
>>>>> :> After an ugly bout of bashing I've solve my failing snapshot issue
>>>>> :> which I'll post here in hopes of saving someonelse
>>>>> :>
>>>>> :> Short version:
>>>>> :>
>>>>> :> add "/dev/vfio/vfio rw," to
>>>>> /etc/apparmor.d/abstractions/libvirt-qemu
>>>>> :> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf
>>>>> :>
>>>>> :> Longer version:
>>>>> :>
>>>>> :> What was happening.
>>>>> :>
>>>>> :> * send snapshot request
>>>>> :> * instance pauses while snapshot is pending
>>>>> :> * instance attempt to resume
>>>>> :> * fails to reattach pci device
>>>>> :>   * nova-compute.log
>>>>> :>     Exception during message handling: internal error: unable to
>>>>> execute QEMU command 'device_add': Device initialization failedcompute.log
>>>>> :>
>>>>> :>   * qemu/<id>.log
>>>>> :>     vfio: failed to open /dev/vfio/vfio: Permission denied
>>>>> :>     vfio: failed to setup container for group 48
>>>>> :>     vfio: failed to get group 48
>>>>> :> * snapshot disappears
>>>>> :> * instance resumes but without passed through device (hard reboot
>>>>> :>     reattaches)
>>>>> :>
>>>>> :> seeing permsission denied I though would be an easy fix but:
>>>>> :>
>>>>> :> # ls -l /dev/vfio/vfio
>>>>> :> crw-rw-rw- 1 root root 10, 196 Jul  6 14:05 /dev/vfio/vfio
>>>>> :>
>>>>> :> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio
>>>>> :> rw," to  /etc/apparmor.d/abstractions/libvirt-qemu rebooting the
>>>>> :> hypervisor and trying again which gets me a different libvirt error
>>>>> :> set:
>>>>> :>
>>>>> :> VFIO_MAP_DMA: -12
>>>>> :> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12
>>>>> (Cannot allocate memory)
>>>>> :>
>>>>> :> kern.log (and thus dmesg) showing:
>>>>> :> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
>>>>> :>
>>>>> :> Getting rid of this one required inserting 'ulimit -l unlimited' into
>>>>> :> /etc/init/libvirt-bin.conf in the 'script' section:
>>>>> :>
>>>>> :> <previous bits excluded>
>>>>> :> script
>>>>> :>         [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
>>>>> :>         ulimit -l unlimited
>>>>> :>         exec /usr/sbin/libvirtd $libvirtd_opts
>>>>> :> end script
>>>>> :>
>>>>> :>
>>>>> :> -Jon
>>>>> :>
>>>>> :> _______________________________________________
>>>>> :> OpenStack-operators mailing list
>>>>> :> OpenStack-operators at lists.openstack.org
>>>>> :>
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>> :
>>>>> :
>>>>> :
>>>>> :--
>>>>> :Cheers,
>>>>> :~Blairo
>>>>> 
>>>>> --
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Cheers,
>>>> ~Blairo
>>>> 
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Cheers,
>> ~Blairo
> 
> 
> 
> -- 
> Cheers,
> ~Blairo
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




More information about the OpenStack-operators mailing list