Hi Eugen,

I have noticed one more behavior. When I launch an instance from an image it works fast but when I do with volume then it will take a super long time. felt like it's doing an incompressible image or something even its RAW image. 

If I upload a new fresh image that works quickly no matter image or volume boot. Seems rbd-mirror replication changed some image properties which openstack doesn't understand or something causing image download and upload issues just like qcow2. 

I will look into it and update you on what is going on. If you have any clue please let me know. 


On Fri, Feb 2, 2024 at 2:44 AM Eugen Block <eblock@nde.ag> wrote:
Okay, glad you brought it back. I would be curious as well to 
understand what happened.

Zitat von Satish Patel <satish.txt@gmail.com>:

> Hi Eugen,
>
> After shelve and unshelve bring VM back to life. This is very odd and I
> haven't seen this behavior before.
>
> On Thu, Feb 1, 2024 at 11:24 AM Eugen Block <eblock@nde.ag> wrote:
>
>> I’m not sure if I understand all of it, but there currently is only
>> one cluster active? And that’s where this output is from? What does
>> ‘rbd status’ tell you?
>>
>> Zitat von Satish Patel <satish.txt@gmail.com>:
>>
>> > Older ceph cluster is down because everything came up so we shut down
>> > entire cluster and realized one vms stuck in this error state.. in
>> current
>> > cluster this is what its showing
>> >
>> > # rbd info -p volumes volume-77b123ff-915f-4e0b-8d74-d34fde12528b
>> > rbd image 'volume-77b123ff-915f-4e0b-8d74-d34fde12528b':
>> > size 120 GiB in 30720 objects
>> > order 22 (4 MiB objects)
>> > snapshot_count: 0
>> > id: 87bfb47beb93
>> > block_name_prefix: rbd_data.87bfb47beb93
>> > format: 2
>> > features: layering, exclusive-lock, object-map, fast-diff, deep-flatten,
>> > journaling
>> > op_features:
>> > flags:
>> > create_timestamp: Sun Jan 28 05:28:30 2024
>> > access_timestamp: Thu Feb  1 15:28:57 2024
>> > modify_timestamp: Thu Feb  1 06:17:30 2024
>> > journal: 87bfb47beb93
>> > mirroring state: enabled
>> > mirroring mode: journal
>> > mirroring global id: 0d488c59-cd44-47a8-86b7-c24509f7771b
>> > mirroring primary: true
>> >
>> > On Thu, Feb 1, 2024 at 3:44 AM Eugen Block <eblock@nde.ag> wrote:
>> >
>> >> Hi,
>> >>
>> >> have you compared the affected rbd images with working images? Maybe
>> >> the mirroring failed for those images? Were they promoted correctly?
>> >> Which mirror mode are you using, journal or snapshot? I would check
>> >> the 'rbd info pool/image' output and compare to see if there's a
>> >> difference.
>> >>
>> >>
>> >> Zitat von Satish Patel <satish.txt@gmail.com>:
>> >>
>> >> > Folks,
>> >> >
>> >> > I have a ceph cluster and recently I configured rbd-mirror to
>> replicate
>> >> all
>> >> > data to remove ceph cluster for disaster recovery.
>> >> >
>> >> > Yesterday for POC we did a hard cutover on ceph and point openstack to
>> >> the
>> >> > new cluster. All other vms came back up fine but 2 vms stuck in this
>> >> error
>> >> > state in libvirt logs
>> >> >
>> >> > 2024-01-31 22:44:37.591+0000: 474597: error :
>> >> > qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute
>> >> QEMU
>> >> > command 'query-named-block-nodes': cannot read image start for probe:
>> >> > Permission denied
>> >> >
>> >> > If it's a wider issue than it should impact all the vms but why only
>> two
>> >> VM
>> >> > stuck and not starting and libvirt giving me this error in logs
>> >> >
>> >> > nova-compute logs also showing same error
>> >> >
>> >> > 2024-01-31 22:44:40.925 7 INFO nova.compute.manager [None
>> >> > req-b33485b5-8740-48ae-8b5b-a440de3f11a4
>> c48fcfb9347f413f92fcece065644b00
>> >> > ca5c652478c7429e964257990800e9cb - - default default] [instance:
>> >> > 2de0f880-77c7-4d2c-9e01-898c57ad3693] Successfully reverted task state
>> >> from
>> >> > powering-on on failure for instance.
>> >> > 2024-01-31 22:44:40.944 7 ERROR oslo_messaging.rpc.server [None
>> >> > req-b33485b5-8740-48ae-8b5b-a440de3f11a4
>> c48fcfb9347f413f92fcece065644b00
>> >> > ca5c652478c7429e964257990800e9cb - - default default] Exception during
>> >> > message handling: libvirt.libvirtError: internal error: unable to
>> execute
>> >> > QEMU command 'query-named-block-nodes': cannot read image start for
>> >> probe:
>> >> > Permission denied
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>>