[nova] NVLink passthrough

Cory Hawkvelt cory at hawkvelt.id.au
Sat Jun 25 13:35:46 UTC 2022

Hey team,

Is anyone working with NVLink in their clouds? How are you handling passing
through the right set of GPU's per NVLink 'group'

I have servers with 4 sets of 2 way NVLinks(8 cards in pairs of 2) and I'm
able to passthrough the PCI devices to the VM no problem but there is no
guarantee that the NVLink pair get passed through together and if we end up
with 1 GPU from one pair and 1 GPU from another pair then we run into all
sorts of issues(As you'd expect)

So I'm looking to understand how Nova can be NVLink aware I guess but I'm
struggling to find any conversation ro material on the topic but I assume
it's been done before?

I did find this talk [1] which mentioned this problem but they write some
sort of hack to sit in between nova and qemu, while quite a clever solution
it seems like there must be a better way to do this in 2022?

[1] -

