[openstack-operators] RBD problems after data center power outage
Greetings, Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files? Thanks. [instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory /* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */ /* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
Hi, I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though. Regards, Eugen https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/ Zitat von Dinçer Çelik <hello@dincercelik.com>:
Greetings,
Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
Thanks.
[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
Very interesting post. Sent from my iPhone
On Oct 18, 2019, at 2:44 AM, Eugen Block <eblock@nde.ag> wrote:
Hi,
I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.
Regards, Eugen
https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/
Zitat von Dinçer Çelik <hello@dincercelik.com>:
Greetings,
Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
Thanks.
[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
Hi Eugen, I think this is not the same situation with I’m facing because I can get rbd headers. Regards /* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */ /* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
On 18 Oct 2019, at 09:44, Eugen Block <eblock@nde.ag> wrote:
Hi,
I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.
Regards, Eugen
https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/
Zitat von Dinçer Çelik <hello@dincercelik.com>:
Greetings,
Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
Thanks.
[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
I assumed the header was missing because of this message:
error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
If you can stat the header file can you share the output of rados -p vms listomapvals rbd_header.<BLOCK_PREFIX> Are there rbd_data objects left in the pool from that config drive? rados -p images ls | grep <BLOCK_PREFIX> rbd_object_map.1cbc666b8b4567 rbd_data.1cbc666b8b4567.0000000000000000 rbd_header.1cbc666b8b4567 If yes, maybe there's a way to set things back together, which I haven't done yet. Are all affected VMs referring to a config drive and is it always the config drive object that's missing? Zitat von Dinçer Çelik <hello@dincercelik.com>:
Hi Eugen,
I think this is not the same situation with I’m facing because I can get rbd headers.
Regards
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
On 18 Oct 2019, at 09:44, Eugen Block <eblock@nde.ag> wrote:
Hi,
I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.
Regards, Eugen
https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/
Zitat von Dinçer Çelik <hello@dincercelik.com>:
Greetings,
Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
Thanks.
[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
Hi, I've fixed the issue. First, I would like to thank to all Ceph developers for making it bulletproof. The root cause was "force_config_drive"[1] option of Nova that I had enabled few weeks ago. When you enable this option, Nova creates a new disk with the same name ending with ".config". The reason why I had enabled this option is, I am facing dhcp related issues sometimes. Temporary disabling this option fixed the issue. Regards [1] https://docs.openstack.org/nova/stein/configuration/config.html#DEFAULT.forc... /* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */ /* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
On 18 Oct 2019, at 14:29, Eugen Block <eblock@nde.ag> wrote:
I assumed the header was missing because of this message:
error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
If you can stat the header file can you share the output of
rados -p vms listomapvals rbd_header.<BLOCK_PREFIX>
Are there rbd_data objects left in the pool from that config drive?
rados -p images ls | grep <BLOCK_PREFIX> rbd_object_map.1cbc666b8b4567 rbd_data.1cbc666b8b4567.0000000000000000 rbd_header.1cbc666b8b4567
If yes, maybe there's a way to set things back together, which I haven't done yet. Are all affected VMs referring to a config drive and is it always the config drive object that's missing?
Zitat von Dinçer Çelik <hello@dincercelik.com>:
Hi Eugen,
I think this is not the same situation with I’m facing because I can get rbd headers.
Regards
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
On 18 Oct 2019, at 09:44, Eugen Block <eblock@nde.ag> wrote:
Hi,
I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.
Regards, Eugen
https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/
Zitat von Dinçer Çelik <hello@dincercelik.com>:
Greetings,
Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
Thanks.
[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC BD43 D2AE DD2A 7893 0EAA */
participants (3)
-
Dinçer Çelik
-
Eugen Block
-
Satish Patel