Hi, I guess it really depends on what HPC means to you ;-) Do your users schedule nova instances? Can you actually do Infiniband RDMA between nova VMs? Or are your users scheduling ironic instances via nova? We have an openstack setup based on kayobe/kolla where we: - Deploy 8 hyperconverged OpenStack nova libvirt compute+Ceph storage (mostly for block storage and cephfs comes in handy for shared file systems, e.g., for shared state in Slurm) servers. - Deploy 300+ bare metal compute nodes via ironic. - Use Slurm for the job scheduler. - Setup everything after the bare OS with Ansible. - Have login nodes and slurm scheduler etc. run in nova VMs. - Only have Ethernet interconnect (100/25GbE for central switches and recent servers and 40/10GbE for older servers). So we are using OpenStack nova+libvirt as a unified way of deploying virtual and physical machines and then install them as we would do with normal servers. That's a bit non-cloudy (as far as I have learned) but works really well for us. Now to your HPC storage question... that I hope to answer a bit indirectly. The main advantage for using manila with CephFS (that I can see) is that you get the openstack goodies of API and Horizon clients for managing shares. I guess this is mostly useful if you want to have cloud features for your HPC such as users allocating storage in self-service to their nova/ironic machines. We come from a classic HPC setting where the partitioning of the system is not done by creating multiple nodes/clusters by users in self-service, but rather administration provides a bare metal cluster with Slurm scheduler. Users log in to head nodes and submit jobs to the compute nodes to Slurm. Thus Slurm does the managing of resources and users can allocate single cores up to the whole cluster. So in our use case there would not be a major advantage of using manila for our storage as we would primarily have one export that gives access to the whole storage ;-). We currently have an old GPFS storage that we mount on all nodes via Ansible. We are currently migrating to using an additional, dedicated NVME-based ceph cluster (that is not hyperconverged with our compute) and that we would also mount via Ansible. As we essentially only have a single share on this storage managing the share with manila would be more trouble than work. The much more important part will be setting up Ceph appropriately and tuning it to perform well (e.g., using the IO500 benchmark as the croit people demonstrate here [1]). I guess that does not really answer your question but I hope that it gives useful perspective to you and maybe others. You can look at the work of the wonderful people of StackHPC that provide commercial services around Openstack/Kayobe/Kolla/Ironic for HPC setups. There are a couple of interesting videos involving their staff and they have very useful information on the web site as well. And as Mark Goddard just wrote you can have a look at the Scientific SIG where they are involved as well. (I'm not in any way related to StackHPC, I just really like their work). Best wishes, Manuel [1] https://croit.io/blog/ceph-performance-test-and-optimization On Thu, Mar 3, 2022 at 9:25 AM Mahendra Paipuri <mahendra.paipuri@cnrs.fr> wrote:
Hello,
We are quite interested in this too. When we were looking into existing solutions, we found there has been some work in integrating LUSTRE into OpenStack [1], [2]. I remember coming across some Openinfra talks on developing a manila backend driver for LUSTRE. I am not quite sure if this project is still ongoing. Manila already provides a backend driver for GPFS [3] (unfortunately GPFS is not opensource) to readily integrate it into Openstack. Manila supports GlusterFS and CephFS as well but they do not have RDMA support (if I am not wrong).
This is pretty much what we found. Would be glad explore more solutions if someone knows any.
Cheers
-
Mahendra
[1] https://docs.google.com/presentation/d/1kGRzcdVQX95abei1bDVoRzxyC02i89_m5_sO...
[2] https://www.openstack.org/videos/summits/barcelona-2016/lustre-integration-f...
[3] https://docs.openstack.org/manila/latest/admin/gpfs_driver.html
On 03/03/2022 04:47, Satish Patel wrote:
Folks,
I built a 30 node HPC environment on openstack using mellanox infiniband nic for high speed MPI messaging. So far everything works. Now I am looking for a HPC PFS (parallel file system) similar to Luster which I can mount on all HPC vms to run MPI jobs.
I was reading on google and saw some CERN videos and they are using Ceph (cephfs) for PFS. Also i was reading CephFS + Manila is a good choice for HPC on openstack design.
Does anyone have any experience with HPC storage for Openstack? Please advice or share your experience :) Thanks in advance.
~S