RE: Putting DB/WAL on SSD on

17 Sep 2025

      We run a handful of ceph clusters, mostly used to provide volumes to OpenStack, an second everything Eugen is saying. 

My tips would be:
1. More hosts/nodes is better; as Eugen mentioned, a host/node failure reduces your available capacity. The last thing you ever want to do is make Ceph run out of space. We run 9 nodes with fewer drives; we can easily expand the cluster by simply adding in new drives as capacity OSDs. Our risk profile might be different to yours, but more nodes and OSDs give you a little bit more redundancy; a single node failure is an 11% hit to capacity, as opposed to a 33% hit, for example.
2. Try and avoid Erasure Coding unless you're 100% certain it works for your use case. The performance overhead is just so costly.
3. Having multiple ceph storage pools can be a good idea; as Eugen mentioned, large capacity HDDs with RocksDB/WAL on the SSD is good for file systems/glacial file storage, but you just wont have very much IO to run VMs on it. It can be great as a second volume type though, for large volume storage to attach to otherwise fast instances.
4. KISS! (Keep it Simple, Silly!) Trying to make things complicated with one-way asynchronous replication and trying to be fancy takes away from what is one of Ceph's main features - it's rock solid and very reliable if you let it work how it's meant to work.
5. There are also a lot of very smart industry professionals out there whom you can consult with for support and advice; I'd be very quick to recommend the gang at Clyso and 45Drives.

Ceph was very overwhelming to start with (kind of like Openstack at the start) but once you get it set up, it makes a lot of sense and it is fairly easy to use and well documented. I'd recommend just throwing together a CephADM deployment, having a play, and blowing it away and doing it again. Setting up ProxmoxVE as a consumer can be a pretty quick way to test it, as well, if you don't want to have to worry too much about reconfiguring your openstack to work with Cinder and your new Ceph back end (assuming it doesn't already).

Kind Regards,

Joel McLean – Micron21 Pty Ltd

-----Original Message-----
From: Eugen Block <eblock@nde.ag> 
Sent: Wednesday, 17 September 2025 3:30 PM
To: openstack-discuss@lists.openstack.org
Subject: Re: Putting DB/WAL on SSD on

And one more comment from our own experience: we tried using VMs based on HDD + DB on SSD but it was too slow. We decided to configure a cache tier (SSD only pool) in front of our main pool, that worked quite well for years. But cache tier is deprecated so we had to get rid of it, which we did a couple of months back before we upgraded to Ceph Reef. We moved almost all our data pools to SSDs during that process. So the main question is: can you be sure that HDD + SSD for DB will suffice your performance requirements? One of the best things about Ceph is how flexible it is, you can reshape it anytime. So if you start with this HDD + SSD mix and it works for you, great! If it's too slow, you can reconfigure it according to your needs.

Zitat von Eugen Block <eblock@nde.ag>:
...
Hi,
I'd say this question is more suitable for the ceph-users mailing 
list, but since Ceph is quite popular as an OpenStack storage back 
end, you'll probably get helpful responses here as well. ;-)
More responses inline...
Zitat von William Muriithi <wmuriithi@perasoinc.com>:
...
Hello,
We want to use ceph as the openstack storage system and we can afford 
a purely SSD based storage system.  So we are planning to just setup 
meta data on SSD and leave the data on HDD
The documentation around this isn't very clear and wonder if someone 
can explain a bit
Here is what the documentation say:-
https://docs.ceph.com/en/reef/start/hardware-recommendations
DB/WAL (optional)
1x SSD partion per HDD OSD 4-5x HDD OSDs per DB/WAL SATA SSD <= 10 
HDD OSDss per DB/WAL NVMe SSD What does this mean?  I am sorry to 
say, but it look a tad ambiguous to me, but suspect its obvious once 
one have experience.
You should not put the DBs of more than 10 HDD OSDs on one SSD, a bit 
lower is better, but it really depends on the actual workload etc. In 
your case with 6 OSDs in total per node, you can safely put all 6 DB 
devices on one SSD.
...
I have 6 8TB disk per system, and I want to use replication so will 
end up with 18 OSD
This sounds like you're planning to have 3 nodes in total (replication 
size 3, good). But note that in case a node is in maintenance and one 
more node goes down, you'll have a service interruption since monitor 
quorum won't be possible until at least one more node comes back. Or 
consider this: you lose one entire node (hardware failure), then your 
PGs can't recover anywhere and stay degraded until a third node is 
available again.
We've been running our own Ceph cluster for many years with this exact 
setup, three nodes in total, and we never had any issues. But I just 
want to raise awareness because many (new to Ceph) operators aren't 
really considering these possibilities.
...
I am hoping I don't need 18 SSD as we don't even have enough bays
No, you definitely don't need that many SSDs.
...
If we can add in 2 800GB SSD per hardware, how do we optimally map 
those 18 DW/WAL to a total of 6 SSD disks?
As I wrote above, one SSD per node should be sufficient to put 6 DBs on it.
...
Regards,
William