OpenStack 14 CentOS and Nvidia driver for vgpu?
Hello list, I'm struggling deploying Rocky with vGPU using nvidia drivers. Has anyone experienced the issues loading nvidia modules? I'm talking about hypervisor part of the setup. There are two modules provided by nvidia. One loads correctly it's the nvidia.ko one. The other however does not. The module is called nvidia-vgpu-vfio.ko I'm trying to load it and it seems that 7.6 kernel is no longer compatible with it modprobe nvidia-vgpu-vfio modprobe: ERROR: could not insert 'nvidia_vgpu_vfio': Invalid argument dmesg shows this: nvidia_vgpu_vfio: disagrees about version of symbol vfio_pin_pages nvidia_vgpu_vfio: Unknown symbol vfio_pin_pages (err -22) nvidia_vgpu_vfio: disagrees about version of symbol vfio_unpin_pages nvidia_vgpu_vfio: Unknown symbol vfio_unpin_pages (err -22) nvidia_vgpu_vfio: disagrees about version of symbol vfio_register_notifier nvidia_vgpu_vfio: Unknown symbol vfio_register_notifier (err -22) nvidia_vgpu_vfio: disagrees about version of symbol vfio_unregister_notifier nvidia_vgpu_vfio: Unknown symbol vfio_unregister_notifier (err -22) modinfo nvidia-vgpu-vfio filename: /lib/modules/3.10.0-957.27.2.el7.x86_64/weak-updates/nvidia-vgpu-vfio.ko version: 430.27 supported: external license: MIT rhelversion: 7.6 srcversion: 0A179A61A02AD500D05FB1A alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00* alias: pci:v000010DEd*sv*sd*bc03sc02i00* alias: pci:v000010DEd*sv*sd*bc03sc00i00* depends: nvidia,mdev,vfio vermagic: 3.10.0-940.el7.x86_64 SMP mod_unload modversions My guess is that somewhere along the rhel/centos 7.6 lifecycle vfio module changed the vfio module and broke the compatibility. Nvidia provides those modules built against the BETA 7.6 release and assume weak-modules to make it work. Somehow it does not. Anybody got any suggestions how to handle this? I'm working on it with nvidia enterprise support but maybe one of you got there first? best regards -- Piotr Baranowski
participants (1)
-
Piotr Baranowski