Hi Eugen,

After shelve and unshelve bring VM back to life. This is very odd and I haven't seen this behavior before.  

On Thu, Feb 1, 2024 at 11:24 AM Eugen Block <eblock@nde.ag> wrote:
I’m not sure if I understand all of it, but there currently is only 
one cluster active? And that’s where this output is from? What does 
‘rbd status’ tell you?

Zitat von Satish Patel <satish.txt@gmail.com>:

> Older ceph cluster is down because everything came up so we shut down
> entire cluster and realized one vms stuck in this error state.. in current
> cluster this is what its showing
>
> # rbd info -p volumes volume-77b123ff-915f-4e0b-8d74-d34fde12528b
> rbd image 'volume-77b123ff-915f-4e0b-8d74-d34fde12528b':
> size 120 GiB in 30720 objects
> order 22 (4 MiB objects)
> snapshot_count: 0
> id: 87bfb47beb93
> block_name_prefix: rbd_data.87bfb47beb93
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff, deep-flatten,
> journaling
> op_features:
> flags:
> create_timestamp: Sun Jan 28 05:28:30 2024
> access_timestamp: Thu Feb  1 15:28:57 2024
> modify_timestamp: Thu Feb  1 06:17:30 2024
> journal: 87bfb47beb93
> mirroring state: enabled
> mirroring mode: journal
> mirroring global id: 0d488c59-cd44-47a8-86b7-c24509f7771b
> mirroring primary: true
>
> On Thu, Feb 1, 2024 at 3:44 AM Eugen Block <eblock@nde.ag> wrote:
>
>> Hi,
>>
>> have you compared the affected rbd images with working images? Maybe
>> the mirroring failed for those images? Were they promoted correctly?
>> Which mirror mode are you using, journal or snapshot? I would check
>> the 'rbd info pool/image' output and compare to see if there's a
>> difference.
>>
>>
>> Zitat von Satish Patel <satish.txt@gmail.com>:
>>
>> > Folks,
>> >
>> > I have a ceph cluster and recently I configured rbd-mirror to replicate
>> all
>> > data to remove ceph cluster for disaster recovery.
>> >
>> > Yesterday for POC we did a hard cutover on ceph and point openstack to
>> the
>> > new cluster. All other vms came back up fine but 2 vms stuck in this
>> error
>> > state in libvirt logs
>> >
>> > 2024-01-31 22:44:37.591+0000: 474597: error :
>> > qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute
>> QEMU
>> > command 'query-named-block-nodes': cannot read image start for probe:
>> > Permission denied
>> >
>> > If it's a wider issue than it should impact all the vms but why only two
>> VM
>> > stuck and not starting and libvirt giving me this error in logs
>> >
>> > nova-compute logs also showing same error
>> >
>> > 2024-01-31 22:44:40.925 7 INFO nova.compute.manager [None
>> > req-b33485b5-8740-48ae-8b5b-a440de3f11a4 c48fcfb9347f413f92fcece065644b00
>> > ca5c652478c7429e964257990800e9cb - - default default] [instance:
>> > 2de0f880-77c7-4d2c-9e01-898c57ad3693] Successfully reverted task state
>> from
>> > powering-on on failure for instance.
>> > 2024-01-31 22:44:40.944 7 ERROR oslo_messaging.rpc.server [None
>> > req-b33485b5-8740-48ae-8b5b-a440de3f11a4 c48fcfb9347f413f92fcece065644b00
>> > ca5c652478c7429e964257990800e9cb - - default default] Exception during
>> > message handling: libvirt.libvirtError: internal error: unable to execute
>> > QEMU command 'query-named-block-nodes': cannot read image start for
>> probe:
>> > Permission denied
>>
>>
>>
>>