Hello,

I do not use CentOS and XFS but I had a simillar issue after an outrage. Ceph didnt release the lock on rados block device. You can check if you are facing the same issue than I did. You have to shutdown your instance then type this command:

rbd -p your-pool-name lock list instance-volume-id

The command should not return any output if your instance is shut. If you got an output about 1 exclusive lock just remove it:

rbd -p your-pool-name lock remove instance-volume-id

Best Regards,

Romain

On Tue, 2020-07-21 at 14:04 +0100, Grant Morley wrote:

Hi all,

We recently had an issue with our ceph cluster which ended up going into "Error" status after some drive failures. The system stopped allowing writes for a while whilst it recovered. The ceph cluster is healthy again but we seem to have a few instances that have corrupt filesystems on them. They are all CentOS 7 instances. We have got them into rescue mode to try and repair the FS with "xfs_repair -L" However when we do that we get this:

973.026283] XFS (vdb1): Mounting V5 Filesystem
[ 973.203261] blk_update_request: I/O error, dev vdb, sector 8389693
[ 973.204746] blk_update_request: I/O error, dev vdb, sector 8390717
[ 973.206136] blk_update_request: I/O error, dev vdb, sector 8391741
[ 973.207608] blk_update_request: I/O error, dev vdb, sector 8392765
[ 973.209544] XFS (vdb1): xfs_do_force_shutdown(0x1) called from line 1236 of file fs/xfs/xfs_buf.c. Return address = 0xffffffffc017a50c
[ 973.212137] XFS (vdb1): I/O Error Detected. Shutting down filesystem
[ 973.213429] XFS (vdb1): Please umount the filesystem and rectify the problem(s)
[ 973.215036] XFS (vdb1): metadata I/O error: block 0x7ffc3d ("xlog_bwrite") error 5 numblks 8192
[ 973.217201] XFS (vdb1): failed to locate log tail
[ 973.218239] XFS (vdb1): log mount/recovery failed: error -5
[ 973.219865] XFS (vdb1): log mount failed
[ 973.233792] blk_update_request: I/O error, dev vdb, sector 0

Interestingly any debian based instances we could recover. It just seems to be CentOS and having XFS on CentOS and ceph the instances don't seem happy. This seems more low level to me in ceph rather than a corrupt FS on a guest.

Does anyone know of any "ceph tricks" that we can use to try and at least get an "xfs_repair" running?

Many thanks,

--

Grant Morley

Cloud Lead, Civo Ltd

www.civo.com