Re: [Nova][vms]Preventing VM I/O Errors When Ceph OSD Nodes Go Down

3 Mar 2026

      Hello Thamanna,

The scenario you described, implies your Ceph cluster being that broken 
it is simply unable to serve any I/O any more.

Your virtual machine work load literally experiences the storage being 
taken away. There is no remedy to cope with that.

So, if you're asking about the best practice to recover from such 
issues: make back-ups (no snapshots, those are not backups) that you 
periodically test that you can restore.

Meanwhile, as mentioned before, I'd suggest to understand why the Ceph 
nodes were stuck in the first place.

Cheers,
Kees

__

Kees Meijs BICT

Nefos Cloud & IT <https://nefos.com/contact>

Nefos IT bv
Burgemeester Mollaan 34a
5582 CK Waalre - NL
kvk 66494931

+31 (0)88 2088 188 <tel:+31882088188>
nefos.com <https://nefos.com/contact>

The information contained in this message is intended for the addressee 
only and may contain classified information. If you are not the 
addressee, please delete this message and notify the sender; you should 
not copy or distribute this message or disclose its contents to anyone. 
Any views or opinions expressed in this message are those of the 
individual(s) and not necessarily of the organization. No reliance may 
be placed on this message without written confirmation from an 
authorised representative of its contents. No guarantee is implied that 
this message or any attachment is virus free or has not been intercepted 
and amended.

General terms and conditions ("The NLdigital Terms") apply to all our 
products and services.

On 03/03/2026 04:57, Thamanna Farhath wrote:
...
Thank you for your clarification. We understand that this behavior is 
by design in Ceph and that OpenStack Nova will not automatically take 
action when storage becomes unavailable.
However, in our case, simply rebooting the affected VMs is not always 
sufficient. If a crash occurs and persistent I/O errors are seen 
inside the guest, we would like to understand the recommended recovery 
procedure.
In such scenarios, how can we safely retrieve and restore the instance 
once Ceph regains quorum?
What is the best practice to recover RBD-backed instances after write 
failures to avoid permanent corruption?

Re: [Nova][vms]Preventing VM I/O Errors When Ceph OSD Nodes Go Down

Kees Meijs | Nefos