Hi,

I've fixed the issue.

First, I would like to thank to all Ceph developers for making it bulletproof.

The root cause was "force_config_drive"[1] option of Nova that I had enabled few weeks ago. When you enable this option, Nova creates a new disk with the same name ending with ".config". The reason why I had enabled this option is, I am facing dhcp related issues sometimes.

Temporary disabling this option fixed the issue.

Regards

[1] https://docs.openstack.org/nova/stein/configuration/config.html#DEFAULT.force_config_drive

/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */

/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */

On 18 Oct 2019, at 14:29, Eugen Block <eblock@nde.ag> wrote:

I assumed the header was missing because of this message:

error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory

If you can stat the header file can you share the output of

rados -p vms listomapvals rbd_header.<BLOCK_PREFIX>

Are there rbd_data objects left in the pool from that config drive?

rados -p images ls | grep <BLOCK_PREFIX>
rbd_object_map.1cbc666b8b4567
rbd_data.1cbc666b8b4567.0000000000000000
rbd_header.1cbc666b8b4567

If yes, maybe there's a way to set things back together, which I haven't done yet. Are all affected VMs referring to a config drive and is it always the config drive object that's missing?


Zitat von Dinçer Çelik <hello@dincercelik.com>:

Hi Eugen,

I think this is not the same situation with I’m facing because I can get rbd headers.

Regards

/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */

/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */

On 18 Oct 2019, at 09:44, Eugen Block <eblock@nde.ag> wrote:

Hi,

I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.

Regards,
Eugen

https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/


Zitat von Dinçer Çelik <hello@dincercelik.com>:

Greetings,

Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?

Thanks.

[instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory

/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */

/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */