<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Thank you Jason,  Not sure how I missed that step.<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 2018-07-06 08:34 AM, Jason Dillaman

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CA+aFP1C2zA1UroaWMhThD1eFFurPU9DOSq1YgA2HEw0SLQ3yQg@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">There have been several similar reports on the

        mailing list about this [1][2][3][4] that are always a result of

        skipping step 6 from the Luminous upgrade guide [5]. The new

        (starting Luminous) 'profile rbd'-style caps are designed to try

        to simplify caps going forward [6]. 

        <div><br>

        </div>

        <div>TL;DR: your Openstack CephX users need to have permission

          to blacklist dead clients that failed to properly release the

          exclusive lock.

          <div>

            <div><br>

            </div>

            <div>[1] <a

href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022278.html"

                moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022278.html</a></div>

            <div>[2] <a

href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022694.html"

                moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022694.html</a></div>

            <div>[3] <a

href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026496.html"

                moz-do-not-send="true">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026496.html</a></div>

            <div>[4] <a

                href="https://www.spinics.net/lists/ceph-users/msg45665.html"

                moz-do-not-send="true">https://www.spinics.net/lists/ceph-users/msg45665.html</a></div>

            <div>[5] <a

href="http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken"

                moz-do-not-send="true">http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken</a></div>

            <div>[6] <a

href="http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication"

                moz-do-not-send="true">http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication</a></div>

            <div><br>

            </div>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr">On Fri, Jul 6, 2018 at 7:55 AM Gary Molenkamp

          <<a href="mailto:molenkam@uwo.ca" target="_blank"

            moz-do-not-send="true">molenkam@uwo.ca</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">Good morning

          all,<br>

          <br>

          After losing all power to our DC last night due to a storm,

          nearly all <br>

          of the volumes in our Pike cluster are unmountable.  Of the 30

          VMs in <br>

          use at the time, only one has been able to successfully mount

          and boot <br>

          from its rootfs.   We are using Ceph as the backend storage to

          cinder <br>

          and glance.  Any help or pointers to bring this back online

          would be <br>

          appreciated.<br>

          <br>

            What most of the volumes are seeing is<br>

          <br>

          [    2.622252] SGI XFS with ACLs, security attributes, no

          debug enabled<br>

          [    2.629285] XFS (sda1): Mounting V5 Filesystem<br>

          [    2.832223] sd 2:0:0:0: [sda] FAILED Result:

          hostbyte=DID_OK <br>

          driverbyte=DRIVER_SENSE<br>

          [    2.838412] sd 2:0:0:0: [sda] Sense Key : Aborted Command

          [current]<br>

          [    2.842383] sd 2:0:0:0: [sda] Add. Sense: I/O process

          terminated<br>

          [    2.846152] sd 2:0:0:0: [sda] CDB: Write(10) 2a 00 00 80 2c

          19 00 04 <br>

          00 00<br>

          [    2.850146] blk_update_request: I/O error, dev sda, sector

          8399897<br>

          <br>

          or<br>

          <br>

          [    2.590178] EXT4-fs (vda1): INFO: recovery required on

          readonly <br>

          filesystem<br>

          [    2.594319] EXT4-fs (vda1): write access will be enabled

          during recovery<br>

          [    2.957742] print_req_error: I/O error, dev vda, sector

          227328<br>

          [    2.962468] Buffer I/O error on dev vda1, logical block 0,

          lost async <br>

          page write<br>

          [    2.967933] Buffer I/O error on dev vda1, logical block 1,

          lost async <br>

          page write<br>

          [    2.973076] print_req_error: I/O error, dev vda, sector

          229384<br>

          <br>

          As a test for one of the less critical vms, I deleted the vm

          and mounted <br>

          the volume on the one VM I managed to start.  The results were

          not <br>

          promising:<br>

          <br>

          <br>

          # dmesg |tail<br>

          [    5.136862] type=1305 audit(1530847244.811:4):

          audit_pid=496 old=0 <br>

          auid=4294967295 ses=4294967295

          subj=system_u:system_r:auditd_t:s0 res=1<br>

          [    7.726331] nf_conntrack version 0.5.0 (65536 buckets,

          262144 max)<br>

          [29374.967315] scsi 2:0:0:1: Direct-Access     QEMU     QEMU

          HARDDISK    <br>

          2.5+ PQ: 0 ANSI: 5<br>

          [29374.988104] sd 2:0:0:1: [sdb] 83886080 512-byte logical

          blocks: (42.9 <br>

          GB/40.0 GiB)<br>

          [29374.991126] sd 2:0:0:1: Attached scsi generic sg1 type 0<br>

          [29374.995302] sd 2:0:0:1: [sdb] Write Protect is off<br>

          [29374.997109] sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08<br>

          [29374.997186] sd 2:0:0:1: [sdb] Write cache: enabled, read

          cache: <br>

          enabled, doesn't support DPO or FUA<br>

          [29375.005968]  sdb: sdb1<br>

          [29375.007746] sd 2:0:0:1: [sdb] Attached SCSI disk<br>

          <br>

          # parted /dev/sdb<br>

          GNU Parted 3.1<br>

          Using /dev/sdb<br>

          Welcome to GNU Parted! Type 'help' to view a list of commands.<br>

          (parted) p<br>

          Model: QEMU QEMU HARDDISK (scsi)<br>

          Disk /dev/sdb: 42.9GB<br>

          Sector size (logical/physical): 512B/512B<br>

          Partition Table: msdos<br>

          Disk Flags:<br>

          <br>

          Number  Start   End     Size    Type     File system  Flags<br>

            1      1049kB  42.9GB  42.9GB  primary  xfs          boot<br>

          <br>

          # mount -t xfs /dev/sdb temp<br>

          mount: wrong fs type, bad option, bad superblock on /dev/sdb,<br>

                  missing codepage or helper program, or other error<br>

          <br>

                  In some cases useful info is found in syslog - try<br>

                  dmesg | tail or so.<br>

          <br>

          # xfs_repair /dev/sdb<br>

          Phase 1 - find and verify superblock...<br>

          bad primary superblock - bad magic number !!!<br>

          <br>

          attempting to find secondary superblock...<br>

          <br>

          <br>

          <br>

          Which eventually fails.   The ceph cluster looks healthy, I

          can export <br>

          the volumes from rbd.  I can find no other errors in ceph of

          openstack <br>

          indicating a fault in either system.<br>

          <br>

               - Is this recoverable?<br>

          <br>

               - What happened to all of these volumes and can this be

          prevented <br>

          from occurring again?  Note that any shutdown vm at the time

          of the <br>

          outage appears to be fine.<br>

          <br>

          <br>

          Relevant versions:<br>

          <br>

               Base OS:  all Centos 7.5<br>

          <br>

               Ceph:  Luminous 12.2.5-0<br>

          <br>

               Openstack:  Latest Pike releases in

          centos-release-openstack-pike-1-1<br>

          <br>

                   nova 16.1.4-1<br>

          <br>

                   cinder  11.1.1-1<br>

          <br>

          <br>

          <br>

          -- <br>

          Gary Molenkamp                  Computer Science/Science

          Technology Services<br>

          Systems Administrator           University of Western Ontario<br>

          <a href="mailto:molenkam@uwo.ca" target="_blank"

            moz-do-not-send="true">molenkam@uwo.ca</a>                 <a

            href="http://www.csd.uwo.ca" rel="noreferrer"

            target="_blank" moz-do-not-send="true">http://www.csd.uwo.ca</a><br>

          (519) 661-2111 x86882           (519) 661-3566<br>

          <br>

          _______________________________________________<br>

          ceph-users mailing list<br>

          <a href="mailto:ceph-users@lists.ceph.com" target="_blank"

            moz-do-not-send="true">ceph-users@lists.ceph.com</a><br>

          <a

            href="http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com"

            rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com</a><br>

        </blockquote>

      </div>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      <div dir="ltr" class="m_-6553100614869360309gmail_signature"

        data-smartmail="gmail_signature">

        <div dir="ltr">

          <div>

            <div dir="ltr">

              <div>

                <div>Jason</div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Gary Molenkamp                  Computer Science/Science Technology Services

Systems Administrator           University of Western Ontario

<a class="moz-txt-link-abbreviated" href="mailto:molenkam@uwo.ca">molenkam@uwo.ca</a>                 <a class="moz-txt-link-freetext" href="http://www.csd.uwo.ca">http://www.csd.uwo.ca</a>

(519) 661-2111 x86882           (519) 661-3566

</pre>

  </body>

</html>