Openstack HPC storage suggestion

Manuel Holtgrewe zyklenfrei at gmail.com
Thu Mar 3 09:30:52 UTC 2022


Hi,

I guess it really depends on what HPC means to you ;-)

Do your users schedule nova instances? Can you actually do Infiniband
RDMA between nova VMs? Or are your users scheduling ironic instances
via nova?

We have an openstack setup based on kayobe/kolla where we:

- Deploy 8 hyperconverged OpenStack nova libvirt compute+Ceph storage
(mostly for block storage and cephfs comes in handy for shared file
systems, e.g., for shared state in Slurm) servers.
- Deploy 300+ bare metal compute nodes via ironic.
- Use Slurm for the job scheduler.
- Setup everything after the bare OS with Ansible.
- Have login nodes and slurm scheduler etc. run in nova VMs.
- Only have Ethernet interconnect (100/25GbE for central switches and
recent servers and 40/10GbE for older servers).

So we are using OpenStack nova+libvirt as a unified way of deploying
virtual and physical machines and then install them as we would do
with normal servers. That's a bit non-cloudy (as far as I have
learned) but works really well for us.

Now to your HPC storage question... that I hope to answer a bit indirectly.

The main advantage for using manila with CephFS (that I can see) is
that you get the openstack goodies of API and Horizon clients for
managing shares. I guess this is mostly useful if you want to have
cloud features for your HPC such as users allocating storage in
self-service to their nova/ironic machines. We come from a classic HPC
setting where the partitioning of the system is not done by creating
multiple nodes/clusters by users in self-service, but rather
administration provides a bare metal cluster with Slurm scheduler.
Users log in to head nodes and submit jobs to the compute nodes to
Slurm. Thus Slurm does the managing of resources and users can
allocate single cores up to the whole cluster. So in our use case
there would not be a major advantage of using manila for our storage
as we would primarily have one export that gives access to the whole
storage ;-).

We currently have an old GPFS storage that we mount on all nodes via
Ansible. We are currently migrating to using an additional, dedicated
NVME-based ceph cluster (that is not hyperconverged with our compute)
and that we would also mount via Ansible. As we essentially only have
a single share on this storage managing the share with manila would be
more trouble than work. The much more important part will be setting
up Ceph appropriately and tuning it to perform well (e.g., using the
IO500 benchmark as the croit people demonstrate here [1]).

I guess that does not really answer your question but I hope that it
gives useful perspective to you and maybe others.

You can look at the work of the wonderful people of StackHPC that
provide commercial services around Openstack/Kayobe/Kolla/Ironic for
HPC setups. There are a couple of interesting videos involving their
staff and they have very useful information on the web site as well.
And as Mark Goddard just wrote you can have a look at the Scientific
SIG where they are involved as well. (I'm not in any way related to
StackHPC, I just really like their work).

Best wishes,
Manuel

[1] https://croit.io/blog/ceph-performance-test-and-optimization

On Thu, Mar 3, 2022 at 9:25 AM Mahendra Paipuri
<mahendra.paipuri at cnrs.fr> wrote:
>
> Hello,
>
> We are quite interested in this too. When we were looking into existing
> solutions, we found there has been some work in integrating LUSTRE into
> OpenStack [1], [2]. I remember coming across some Openinfra talks on
> developing a manila backend driver for LUSTRE. I am not quite sure if
> this project is still ongoing. Manila already provides a backend driver
> for GPFS [3] (unfortunately GPFS is not opensource) to readily integrate
> it into Openstack. Manila supports GlusterFS and CephFS as well but they
> do not have RDMA support (if I am not wrong).
>
> This is pretty much what we found. Would be glad explore more solutions
> if someone knows any.
>
> Cheers
>
> -
>
> Mahendra
>
> [1]
> https://docs.google.com/presentation/d/1kGRzcdVQX95abei1bDVoRzxyC02i89_m5_sOfp8Aq6o/htmlpresent
>
> [2]
> https://www.openstack.org/videos/summits/barcelona-2016/lustre-integration-for-hpc-on-openstack-at-cambridge-and-monash
>
> [3] https://docs.openstack.org/manila/latest/admin/gpfs_driver.html
>
> On 03/03/2022 04:47, Satish Patel wrote:
> > Folks,
> >
> > I built a 30 node HPC environment on openstack using mellanox
> > infiniband nic for high speed MPI messaging. So far everything works.
> > Now I am looking for a HPC PFS (parallel file system)  similar to
> > Luster which I can mount on all HPC vms to run MPI jobs.
> >
> > I was reading on google and saw some CERN videos and they are using
> > Ceph (cephfs) for PFS. Also i was reading CephFS + Manila is a good
> > choice for HPC on openstack design.
> >
> > Does anyone have any experience with HPC storage for Openstack? Please
> > advice or share your experience :)  Thanks in advance.
> >
> > ~S
>



More information about the openstack-discuss mailing list