[openstack-operators] RBD problems after data center power outage

Dinçer Çelik hello at dincercelik.com
Fri Oct 18 19:22:12 UTC 2019


Hi,

I've fixed the issue.

First, I would like to thank to all Ceph developers for making it bulletproof.

The root cause was "force_config_drive"[1] option of Nova that I had enabled few weeks ago. When you enable this option, Nova creates a new disk with the same name ending with ".config". The reason why I had enabled this option is, I am facing dhcp related issues sometimes.

Temporary disabling this option fixed the issue.

Regards

[1] https://docs.openstack.org/nova/stein/configuration/config.html#DEFAULT.force_config_drive

/* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */

/* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */

> On 18 Oct 2019, at 14:29, Eugen Block <eblock at nde.ag> wrote:
> 
> I assumed the header was missing because of this message:
> 
>> error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
> 
> If you can stat the header file can you share the output of
> 
> rados -p vms listomapvals rbd_header.<BLOCK_PREFIX>
> 
> Are there rbd_data objects left in the pool from that config drive?
> 
> rados -p images ls | grep <BLOCK_PREFIX>
> rbd_object_map.1cbc666b8b4567
> rbd_data.1cbc666b8b4567.0000000000000000
> rbd_header.1cbc666b8b4567
> 
> If yes, maybe there's a way to set things back together, which I haven't done yet. Are all affected VMs referring to a config drive and is it always the config drive object that's missing?
> 
> 
> Zitat von Dinçer Çelik <hello at dincercelik.com>:
> 
>> Hi Eugen,
>> 
>> I think this is not the same situation with I’m facing because I can get rbd headers.
>> 
>> Regards
>> 
>> /* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
>> 
>> /* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */
>> 
>>> On 18 Oct 2019, at 09:44, Eugen Block <eblock at nde.ag> wrote:
>>> 
>>> Hi,
>>> 
>>> I've recently found this post [1] to recover a failing header, but I haven't tried it myself. I'm curios if it works though.
>>> 
>>> Regards,
>>> Eugen
>>> 
>>> https://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/
>>> 
>>> 
>>> Zitat von Dinçer Çelik <hello at dincercelik.com>:
>>> 
>>>> Greetings,
>>>> 
>>>> Today I had a data center power outage, and the OpenStack cluster went down. After taking the cluster up again, I cannot start some VMs due to error below. I've tried "rbd object-map rebuild" but it didn't work. What's the proper way to re-create the missing "_disk.config" files?
>>>> 
>>>> Thanks.
>>>> 
>>>> [instance: c2b54eac-179b-4907-9d61-8e075edc21cf] Failed to start libvirt guest: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-17T23:19:41.103720Z qemu-system-x86_64: -drive file=rbd:vms/c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config:id=nova:auth_supported=cephx\;none:mon_host=10.250.129.10\:6789\;10.250.129.11\:6789\;10.250.129.12\:6789\;10.250.129.15\:6789,file.password-secret=ide0-0-0-secret0,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=writeback,discard=unmap: error reading header from c2b54eac-179b-4907-9d61-8e075edc21cf_disk.config: No such file or directory
>>>> 
>>>> /* Please encrypt every message you can. Privacy is your right, don't let anyone take it from you. */
>>>> 
>>>> /* My fingerprint is: 5E50 ABB0 F108 24DA 10CC  BD43 D2AE DD2A 7893 0EAA */
>>> 
>>> 
>>> 
>>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191018/e512b072/attachment.html>


More information about the openstack-discuss mailing list