Hi all,
We recently had an issue with our ceph cluster which ended up going into "Error" status after some drive failures. The system stopped allowing writes for a while whilst it recovered. The ceph cluster is healthy again but we seem to have a few instances that have corrupt filesystems on them. They are all CentOS 7 instances. We have got them into rescue mode to try and repair the FS with "xfs_repair -L" However when we do that we get this:
973.026283]
XFS (vdb1): Mounting V5 Filesystem
[ 973.203261] blk_update_request: I/O error, dev vdb, sector
8389693
[ 973.204746] blk_update_request: I/O error, dev vdb, sector
8390717
[ 973.206136] blk_update_request: I/O error, dev vdb, sector
8391741
[ 973.207608] blk_update_request: I/O error, dev vdb, sector
8392765
[ 973.209544] XFS (vdb1): xfs_do_force_shutdown(0x1) called from
line 1236 of file fs/xfs/xfs_buf.c. Return address =
0xffffffffc017a50c
[ 973.212137] XFS (vdb1): I/O Error Detected. Shutting down
filesystem
[ 973.213429] XFS (vdb1): Please umount the filesystem and
rectify the problem(s)
[ 973.215036] XFS (vdb1): metadata I/O error: block 0x7ffc3d
("xlog_bwrite") error 5 numblks 8192
[ 973.217201] XFS (vdb1): failed to locate log tail
[ 973.218239] XFS (vdb1): log mount/recovery failed: error -5
[ 973.219865] XFS (vdb1): log mount failed
[ 973.233792] blk_update_request: I/O error, dev vdb, sector 0
Interestingly any debian based instances we could recover. It just seems to be CentOS and having XFS on CentOS and ceph the instances don't seem happy. This seems more low level to me in ceph rather than a corrupt FS on a guest.
Does anyone know of any "ceph tricks" that we can use to try and at least get an "xfs_repair" running?
Many thanks,