Openstack HPC Infiniband question
Hi all, I am playing with HPC on openstack cloud deployment and I have a Mellanox infiniband nic card. I have a couple of deployment questions regarding the infiniband network. I am new to ib so excuse me if i ask noob questions. I have configured Mellanox for sriov and created a flavor with the property pci_passthrough:alias='mlx5-sriov-ib:1' to expose VF to my instance. so far so good and i am able to see the ib interface inside my vm and its active. (I am running SM inside infiniband HW switch) root@ib-vm:~# ethtool -i ibs5 driver: mlx5_core[ib_ipoib] version: 5.5-1.0.3 firmware-version: 20.28.1002 (MT_0000000222) expansion-rom-version: bus-info: 0000:00:05.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes I didn't configure any ipaddr on the ibs5 interface etc. For testing purposes I have compiled mpirun hello world program to POC my infiniband network between two instances and I am able to successfully run the mpi sample program. Somewhere i read about neutron-mellanox agent to setup IPoIB for segmentation etc but not very sure that how complex it and what are the advantage here over just using simple passthru of SR-IOV Is this the correct way to set up an HPC cluster using openstack or is there a better way to design HPC on openstack?
Hello, I have worked on IB before although not on the top of Openstack. As primary purpose of IB is to use RDMA, you should check if it is working on your instances. I am not quite sure if a simple hello world code is sufficient to test RDMA functionality. Because, if there is any issue with IB stack, MPI implementations tend to fallback to TCP for communications. One thing you can do is install linux rdma-core [1] (if you have not already done it) and use `ibstat` command to check if your IB ports are up and running. Then you can build OpenMPI with UCX [2] and do a PingPong test [3] to see if you are getting proper bandwidth according to your ConnectX card type. If you are planning to do more HPC tests on Openstack cloud, I suggest you look into HPC package managers like Spack [4] or EasyBuild [5] to build HPC related stack easily. StackHPC has been working on HPC over Openstack clouds and they developed some tools [6] which might of interest to you. I hope that helps!! - Mahendra [1] https://github.com/linux-rdma/rdma-core [2] https://openucx.readthedocs.io/en/master/running.html#openmpi-with-ucx [3] https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide... [4] https://spack-tutorial.readthedocs.io/en/latest/ [5] https://docs.easybuild.io/ [6] https://github.com/stackhpc On 16/02/2022 05:31, Satish Patel wrote:
Hi all,
I am playing with HPC on openstack cloud deployment and I have a Mellanox infiniband nic card. I have a couple of deployment questions regarding the infiniband network. I am new to ib so excuse me if i ask noob questions.
I have configured Mellanox for sriov and created a flavor with the property pci_passthrough:alias='mlx5-sriov-ib:1' to expose VF to my instance. so far so good and i am able to see the ib interface inside my vm and its active. (I am running SM inside infiniband HW switch)
root@ib-vm:~# ethtool -i ibs5 driver: mlx5_core[ib_ipoib] version: 5.5-1.0.3 firmware-version: 20.28.1002 (MT_0000000222) expansion-rom-version: bus-info: 0000:00:05.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
I didn't configure any ipaddr on the ibs5 interface etc. For testing purposes I have compiled mpirun hello world program to POC my infiniband network between two instances and I am able to successfully run the mpi sample program.
Somewhere i read about neutron-mellanox agent to setup IPoIB for segmentation etc but not very sure that how complex it and what are the advantage here over just using simple passthru of SR-IOV
Is this the correct way to set up an HPC cluster using openstack or is there a better way to design HPC on openstack?
Thank you Mahendra, I did compile hello world with openmpi with ucx and also I turned off fallback to TCP while running MPI jobs and it does work. I did test ib interface using ib_read_bw running between two nodes and I almost hit 97 Gbps bandwidth. ib-1 and ib-2 are my two vm instances running on two different hypervisors. root@ib-1:~# ib_write_bw -F --report_gbits ib-2 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 1 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x26 QPN 0x03f0 PSN 0xae88e6 RKey 0x020464 VAddr 0x007fba82a4b000 remote address: LID 0x2a QPN 0x03f2 PSN 0x5ac9fa RKey 0x020466 VAddr 0x007fe68a5cf000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 5000 97.08 97.02 0.185048 --------------------------------------------------------------------------------------- Looks like my infiniband network is working so far based on all my validation tests. I am just curious to know how folks running HPC on openstack using IPoIB or RDMA or any other better and simple way to deploy HPC on openstack. On Wed, Feb 16, 2022 at 5:23 AM Mahendra Paipuri <mahendra.paipuri@cnrs.fr> wrote:
Hello,
I have worked on IB before although not on the top of Openstack. As primary purpose of IB is to use RDMA, you should check if it is working on your instances. I am not quite sure if a simple hello world code is sufficient to test RDMA functionality. Because, if there is any issue with IB stack, MPI implementations tend to fallback to TCP for communications. One thing you can do is install linux rdma-core [1] (if you have not already done it) and use `ibstat` command to check if your IB ports are up and running. Then you can build OpenMPI with UCX [2] and do a PingPong test [3] to see if you are getting proper bandwidth according to your ConnectX card type.
If you are planning to do more HPC tests on Openstack cloud, I suggest you look into HPC package managers like Spack [4] or EasyBuild [5] to build HPC related stack easily. StackHPC has been working on HPC over Openstack clouds and they developed some tools [6] which might of interest to you. I hope that helps!!
-
Mahendra
[1] https://github.com/linux-rdma/rdma-core
[2] https://openucx.readthedocs.io/en/master/running.html#openmpi-with-ucx
[3] https://www.intel.com/content/www/us/en/develop/documentation/imb-user-guide...
[4] https://spack-tutorial.readthedocs.io/en/latest/
[5] https://docs.easybuild.io/
[6] https://github.com/stackhpc
On 16/02/2022 05:31, Satish Patel wrote:
Hi all,
I am playing with HPC on openstack cloud deployment and I have a Mellanox infiniband nic card. I have a couple of deployment questions regarding the infiniband network. I am new to ib so excuse me if i ask noob questions.
I have configured Mellanox for sriov and created a flavor with the property pci_passthrough:alias='mlx5-sriov-ib:1' to expose VF to my instance. so far so good and i am able to see the ib interface inside my vm and its active. (I am running SM inside infiniband HW switch)
root@ib-vm:~# ethtool -i ibs5 driver: mlx5_core[ib_ipoib] version: 5.5-1.0.3 firmware-version: 20.28.1002 (MT_0000000222) expansion-rom-version: bus-info: 0000:00:05.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
I didn't configure any ipaddr on the ibs5 interface etc. For testing purposes I have compiled mpirun hello world program to POC my infiniband network between two instances and I am able to successfully run the mpi sample program.
Somewhere i read about neutron-mellanox agent to setup IPoIB for segmentation etc but not very sure that how complex it and what are the advantage here over just using simple passthru of SR-IOV
Is this the correct way to set up an HPC cluster using openstack or is there a better way to design HPC on openstack?
participants (2)
-
Mahendra Paipuri
-
Satish Patel