Using RDMA in OpenStack managed virtual machines
Hello, community, I would like to ask about best practices for using infiniband and RoCEv2 for openstack managed virtual machines. I am a newbie to this area so any information is appreciated. The use case right now is basically for training/inferencing deep learning applications, for example connecting to parallel filesystem through RDMA (IB/RoCEv2). I've done a very basic search through the internet and found mainline code provides sriov-agent which could do basic VF passthrough. There is also a project named mellanox-networking which looks like it could handle IB but seems not to be updated since Train release. All of the above mentioned codes, seems to not handle switches/routers which in my opinion is not complete (at least for RoCEv2, seems PFC/ECN etc should configure switch). Is there any available implementation for using IB/RoCEv2 in production? Thank you very much for sharing insights. -- Best Regards, Jiatong Shen
participants (1)
-
Jiatong Shen