Hello, I have worked on IB before although not on the top of Openstack. As primary purpose of IB is to use RDMA, you should check if it is working on your instances. I am not quite sure if a simple hello world code is sufficient to test RDMA functionality. Because, if there is any issue with IB stack, MPI implementations tend to fallback to TCP for communications. One thing you can do is install linux rdma-core [1] (if you have not already done it) and use `ibstat` command to check if your IB ports are up and running. Then you can build OpenMPI with UCX [2] and do a PingPong test [3] to see if you are getting proper bandwidth according to your ConnectX card type. If you are planning to do more HPC tests on Openstack cloud, I suggest you look into HPC package managers like Spack [4] or EasyBuild [5] to build HPC related stack easily. StackHPC has been working on HPC over Openstack clouds and they developed some tools [6] which might of interest to you. I hope that helps!! - Mahendra [1] https://github.com/linux-rdma/rdma-core [2] https://openucx.readthedocs.io/en/master/running.html#openmpi-with-ucx [3] https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide... [4] https://spack-tutorial.readthedocs.io/en/latest/ [5] https://docs.easybuild.io/ [6] https://github.com/stackhpc On 16/02/2022 05:31, Satish Patel wrote:
Hi all,
I am playing with HPC on openstack cloud deployment and I have a Mellanox infiniband nic card. I have a couple of deployment questions regarding the infiniband network. I am new to ib so excuse me if i ask noob questions.
I have configured Mellanox for sriov and created a flavor with the property pci_passthrough:alias='mlx5-sriov-ib:1' to expose VF to my instance. so far so good and i am able to see the ib interface inside my vm and its active. (I am running SM inside infiniband HW switch)
root@ib-vm:~# ethtool -i ibs5 driver: mlx5_core[ib_ipoib] version: 5.5-1.0.3 firmware-version: 20.28.1002 (MT_0000000222) expansion-rom-version: bus-info: 0000:00:05.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
I didn't configure any ipaddr on the ibs5 interface etc. For testing purposes I have compiled mpirun hello world program to POC my infiniband network between two instances and I am able to successfully run the mpi sample program.
Somewhere i read about neutron-mellanox agent to setup IPoIB for segmentation etc but not very sure that how complex it and what are the advantage here over just using simple passthru of SR-IOV
Is this the correct way to set up an HPC cluster using openstack or is there a better way to design HPC on openstack?