<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Thank you Jason, Not sure how I missed that step.<br>
</p>
<br>
<div class="moz-cite-prefix">On 2018-07-06 08:34 AM, Jason Dillaman
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CA+aFP1C2zA1UroaWMhThD1eFFurPU9DOSq1YgA2HEw0SLQ3yQg@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">There have been several similar reports on the
mailing list about this [1][2][3][4] that are always a result of
skipping step 6 from the Luminous upgrade guide [5]. The new
(starting Luminous) 'profile rbd'-style caps are designed to try
to simplify caps going forward [6].
<div><br>
</div>
<div>TL;DR: your Openstack CephX users need to have permission
to blacklist dead clients that failed to properly release the
exclusive lock.
<div>
<div><br>
</div>
<div>[1] <a
href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022278.html"
moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022278.html</a></div>
<div>[2] <a
href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022694.html"
moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022694.html</a></div>
<div>[3] <a
href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026496.html"
moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026496.html</a></div>
<div>[4] <a
href="https://www.spinics.net/lists/ceph-users/msg45665.html"
moz-do-not-send="true">https://www.spinics.net/lists/ceph-users/msg45665.html</a></div>
<div>[5] <a
href="http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken"
moz-do-not-send="true">http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken</a></div>
<div>[6] <a
href="http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication"
moz-do-not-send="true">http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication</a></div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Fri, Jul 6, 2018 at 7:55 AM Gary Molenkamp
<<a href="mailto:molenkam@uwo.ca" target="_blank"
moz-do-not-send="true">molenkam@uwo.ca</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Good morning
all,<br>
<br>
After losing all power to our DC last night due to a storm,
nearly all <br>
of the volumes in our Pike cluster are unmountable. Of the 30
VMs in <br>
use at the time, only one has been able to successfully mount
and boot <br>
from its rootfs. We are using Ceph as the backend storage to
cinder <br>
and glance. Any help or pointers to bring this back online
would be <br>
appreciated.<br>
<br>
What most of the volumes are seeing is<br>
<br>
[ 2.622252] SGI XFS with ACLs, security attributes, no
debug enabled<br>
[ 2.629285] XFS (sda1): Mounting V5 Filesystem<br>
[ 2.832223] sd 2:0:0:0: [sda] FAILED Result:
hostbyte=DID_OK <br>
driverbyte=DRIVER_SENSE<br>
[ 2.838412] sd 2:0:0:0: [sda] Sense Key : Aborted Command
[current]<br>
[ 2.842383] sd 2:0:0:0: [sda] Add. Sense: I/O process
terminated<br>
[ 2.846152] sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 80 2c
19 00 04 <br>
00 00<br>
[ 2.850146] blk_update_request: I/O error, dev sda, sector
8399897<br>
<br>
or<br>
<br>
[ 2.590178] EXT4-fs (vda1): INFO: recovery required on
readonly <br>
filesystem<br>
[ 2.594319] EXT4-fs (vda1): write access will be enabled
during recovery<br>
[ 2.957742] print_req_error: I/O error, dev vda, sector
227328<br>
[ 2.962468] Buffer I/O error on dev vda1, logical block 0,
lost async <br>
page write<br>
[ 2.967933] Buffer I/O error on dev vda1, logical block 1,
lost async <br>
page write<br>
[ 2.973076] print_req_error: I/O error, dev vda, sector
229384<br>
<br>
As a test for one of the less critical vms, I deleted the vm
and mounted <br>
the volume on the one VM I managed to start. The results were
not <br>
promising:<br>
<br>
<br>
# dmesg |tail<br>
[ 5.136862] type=1305 audit(1530847244.811:4):
audit_pid=496 old=0 <br>
auid=4294967295 ses=4294967295
subj=system_u:system_r:auditd_t:s0 res=1<br>
[ 7.726331] nf_conntrack version 0.5.0 (65536 buckets,
262144 max)<br>
[29374.967315] scsi 2:0:0:1: Direct-Access QEMU QEMU
HARDDISK <br>
2.5+ PQ: 0 ANSI: 5<br>
[29374.988104] sd 2:0:0:1: [sdb] 83886080 512-byte logical
blocks: (42.9 <br>
GB/40.0 GiB)<br>
[29374.991126] sd 2:0:0:1: Attached scsi generic sg1 type 0<br>
[29374.995302] sd 2:0:0:1: [sdb] Write Protect is off<br>
[29374.997109] sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08<br>
[29374.997186] sd 2:0:0:1: [sdb] Write cache: enabled, read
cache: <br>
enabled, doesn't support DPO or FUA<br>
[29375.005968] sdb: sdb1<br>
[29375.007746] sd 2:0:0:1: [sdb] Attached SCSI disk<br>
<br>
# parted /dev/sdb<br>
GNU Parted 3.1<br>
Using /dev/sdb<br>
Welcome to GNU Parted! Type 'help' to view a list of commands.<br>
(parted) p<br>
Model: QEMU QEMU HARDDISK (scsi)<br>
Disk /dev/sdb: 42.9GB<br>
Sector size (logical/physical): 512B/512B<br>
Partition Table: msdos<br>
Disk Flags:<br>
<br>
Number Start End Size Type File system Flags<br>
1 1049kB 42.9GB 42.9GB primary xfs boot<br>
<br>
# mount -t xfs /dev/sdb temp<br>
mount: wrong fs type, bad option, bad superblock on /dev/sdb,<br>
missing codepage or helper program, or other error<br>
<br>
In some cases useful info is found in syslog - try<br>
dmesg | tail or so.<br>
<br>
# xfs_repair /dev/sdb<br>
Phase 1 - find and verify superblock...<br>
bad primary superblock - bad magic number !!!<br>
<br>
attempting to find secondary superblock...<br>
<br>
<br>
<br>
Which eventually fails. The ceph cluster looks healthy, I
can export <br>
the volumes from rbd. I can find no other errors in ceph of
openstack <br>
indicating a fault in either system.<br>
<br>
- Is this recoverable?<br>
<br>
- What happened to all of these volumes and can this be
prevented <br>
from occurring again? Note that any shutdown vm at the time
of the <br>
outage appears to be fine.<br>
<br>
<br>
Relevant versions:<br>
<br>
Base OS: all Centos 7.5<br>
<br>
Ceph: Luminous 12.2.5-0<br>
<br>
Openstack: Latest Pike releases in
centos-release-openstack-pike-1-1<br>
<br>
nova 16.1.4-1<br>
<br>
cinder 11.1.1-1<br>
<br>
<br>
<br>
-- <br>
Gary Molenkamp Computer Science/Science
Technology Services<br>
Systems Administrator University of Western Ontario<br>
<a href="mailto:molenkam@uwo.ca" target="_blank"
moz-do-not-send="true">molenkam@uwo.ca</a> <a
href="http://www.csd.uwo.ca" rel="noreferrer"
target="_blank" moz-do-not-send="true">http://www.csd.uwo.ca</a><br>
(519) 661-2111 x86882 (519) 661-3566<br>
<br>
_______________________________________________<br>
ceph-users mailing list<br>
<a href="mailto:ceph-users@lists.ceph.com" target="_blank"
moz-do-not-send="true">ceph-users@lists.ceph.com</a><br>
<a
href="http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com</a><br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="m_-6553100614869360309gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>Jason</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Gary Molenkamp Computer Science/Science Technology Services
Systems Administrator University of Western Ontario
<a class="moz-txt-link-abbreviated" href="mailto:molenkam@uwo.ca">molenkam@uwo.ca</a> <a class="moz-txt-link-freetext" href="http://www.csd.uwo.ca">http://www.csd.uwo.ca</a>
(519) 661-2111 x86882 (519) 661-3566
</pre>
</body>
</html>