Dear Community,

We are running OpenStack 2023.1 with Ceph as the backend storage on a 3-node deployment.

Recently, we faced a scenario where two of our servers became unresponsive (hung state), and we had to reboot them. During this time, VMs running on the affected compute node started reporting I/O errors inside the guest OS, such as:

[   33.911093] blk_update_request: I/O error, dev vda, sector 229880 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[   33.914953] Buffer I/O error on dev vda1, logical block 319, lost async page write
[   33.914953] Buffer I/O error on dev vda1, logical block 320, lost async page write
[   33.927594] blk_update_request: I/O error, dev vda, sector 229904 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0

It appears that when Ceph becomes unavailable (or quorum is lost), the VMs continue attempting writes, which results in I/O errors at the guest OS level.

Our goal:

We would like to prevent guest filesystem corruption or I/O errors when Ceph is down. Ideally, we want to:

Pause or block writes from active VMs when Ceph storage is unavailable
Avoid guest OS filesystem corruption
Ensure safer recovery when Ceph services are restored

Disclaimer : The content of this email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error, please notify the sender and remove the messages from your system. If you are not the named addressee, it is strictly forbidden for you to share, circulate, distribute or copy any part of this e-mail to any third party without the written consent of the sender.

E-mail transmission cannot be guaranteed to be secured or error free as information could be intercepted, corrupted, lost, destroyed, arrive late, incomplete, or may contain viruses. Therefore, we do not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email."