We run a handful of ceph clusters, mostly used to provide volumes to OpenStack, an second everything Eugen is saying. My tips would be: 1. More hosts/nodes is better; as Eugen mentioned, a host/node failure reduces your available capacity. The last thing you ever want to do is make Ceph run out of space. We run 9 nodes with fewer drives; we can easily expand the cluster by simply adding in new drives as capacity OSDs. Our risk profile might be different to yours, but more nodes and OSDs give you a little bit more redundancy; a single node failure is an 11% hit to capacity, as opposed to a 33% hit, for example. 2. Try and avoid Erasure Coding unless you're 100% certain it works for your use case. The performance overhead is just so costly. 3. Having multiple ceph storage pools can be a good idea; as Eugen mentioned, large capacity HDDs with RocksDB/WAL on the SSD is good for file systems/glacial file storage, but you just wont have very much IO to run VMs on it. It can be great as a second volume type though, for large volume storage to attach to otherwise fast instances. 4. KISS! (Keep it Simple, Silly!) Trying to make things complicated with one-way asynchronous replication and trying to be fancy takes away from what is one of Ceph's main features - it's rock solid and very reliable if you let it work how it's meant to work. 5. There are also a lot of very smart industry professionals out there whom you can consult with for support and advice; I'd be very quick to recommend the gang at Clyso and 45Drives. Ceph was very overwhelming to start with (kind of like Openstack at the start) but once you get it set up, it makes a lot of sense and it is fairly easy to use and well documented. I'd recommend just throwing together a CephADM deployment, having a play, and blowing it away and doing it again. Setting up ProxmoxVE as a consumer can be a pretty quick way to test it, as well, if you don't want to have to worry too much about reconfiguring your openstack to work with Cinder and your new Ceph back end (assuming it doesn't already). Kind Regards, Joel McLean – Micron21 Pty Ltd -----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Wednesday, 17 September 2025 3:30 PM To: openstack-discuss@lists.openstack.org Subject: Re: Putting DB/WAL on SSD on And one more comment from our own experience: we tried using VMs based on HDD + DB on SSD but it was too slow. We decided to configure a cache tier (SSD only pool) in front of our main pool, that worked quite well for years. But cache tier is deprecated so we had to get rid of it, which we did a couple of months back before we upgraded to Ceph Reef. We moved almost all our data pools to SSDs during that process. So the main question is: can you be sure that HDD + SSD for DB will suffice your performance requirements? One of the best things about Ceph is how flexible it is, you can reshape it anytime. So if you start with this HDD + SSD mix and it works for you, great! If it's too slow, you can reconfigure it according to your needs. Zitat von Eugen Block <eblock@nde.ag>:
Hi,
I'd say this question is more suitable for the ceph-users mailing list, but since Ceph is quite popular as an OpenStack storage back end, you'll probably get helpful responses here as well. ;-)
More responses inline...
Zitat von William Muriithi <wmuriithi@perasoinc.com>:
Hello,
We want to use ceph as the openstack storage system and we can afford a purely SSD based storage system. So we are planning to just setup meta data on SSD and leave the data on HDD
The documentation around this isn't very clear and wonder if someone can explain a bit
Here is what the documentation say:-
https://docs.ceph.com/en/reef/start/hardware-recommendations
DB/WAL (optional) 1x SSD partion per HDD OSD 4-5x HDD OSDs per DB/WAL SATA SSD <= 10 HDD OSDss per DB/WAL NVMe SSD What does this mean? I am sorry to say, but it look a tad ambiguous to me, but suspect its obvious once one have experience.
You should not put the DBs of more than 10 HDD OSDs on one SSD, a bit lower is better, but it really depends on the actual workload etc. In your case with 6 OSDs in total per node, you can safely put all 6 DB devices on one SSD.
I have 6 8TB disk per system, and I want to use replication so will end up with 18 OSD
This sounds like you're planning to have 3 nodes in total (replication size 3, good). But note that in case a node is in maintenance and one more node goes down, you'll have a service interruption since monitor quorum won't be possible until at least one more node comes back. Or consider this: you lose one entire node (hardware failure), then your PGs can't recover anywhere and stay degraded until a third node is available again.
We've been running our own Ceph cluster for many years with this exact setup, three nodes in total, and we never had any issues. But I just want to raise awareness because many (new to Ceph) operators aren't really considering these possibilities.
I am hoping I don't need 18 SSD as we don't even have enough bays
No, you definitely don't need that many SSDs.
If we can add in 2 800GB SSD per hardware, how do we optimally map those 18 DW/WAL to a total of 6 SSD disks?
As I wrote above, one SSD per node should be sufficient to put 6 DBs on it.
Regards, William