Hi, sorry for the delay. I can't imagine the rbd-mirror changing image properties, but who knows. You would see it in the nova-compute logs if it would download and re-upload an image to rbd, you would also see the download in /var/lib/nova/instances/_base on the compute node. Maybe enable debug logs to see more info why it takes so long to launch an instance from the specific image. Maybe someone update the image properties within glance or something? Zitat von Satish Patel <satish.txt@gmail.com>:
Hi Eugen,
I have noticed one more behavior. When I launch an instance from an image it works fast but when I do with volume then it will take a super long time. felt like it's doing an incompressible image or something even its RAW image.
If I upload a new fresh image that works quickly no matter image or volume boot. Seems rbd-mirror replication changed some image properties which openstack doesn't understand or something causing image download and upload issues just like qcow2.
I will look into it and update you on what is going on. If you have any clue please let me know.
On Fri, Feb 2, 2024 at 2:44 AM Eugen Block <eblock@nde.ag> wrote:
Okay, glad you brought it back. I would be curious as well to understand what happened.
Zitat von Satish Patel <satish.txt@gmail.com>:
Hi Eugen,
After shelve and unshelve bring VM back to life. This is very odd and I haven't seen this behavior before.
On Thu, Feb 1, 2024 at 11:24 AM Eugen Block <eblock@nde.ag> wrote:
I’m not sure if I understand all of it, but there currently is only one cluster active? And that’s where this output is from? What does ‘rbd status’ tell you?
Zitat von Satish Patel <satish.txt@gmail.com>:
Older ceph cluster is down because everything came up so we shut down entire cluster and realized one vms stuck in this error state.. in current cluster this is what its showing
# rbd info -p volumes volume-77b123ff-915f-4e0b-8d74-d34fde12528b rbd image 'volume-77b123ff-915f-4e0b-8d74-d34fde12528b': size 120 GiB in 30720 objects order 22 (4 MiB objects) snapshot_count: 0 id: 87bfb47beb93 block_name_prefix: rbd_data.87bfb47beb93 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling op_features: flags: create_timestamp: Sun Jan 28 05:28:30 2024 access_timestamp: Thu Feb 1 15:28:57 2024 modify_timestamp: Thu Feb 1 06:17:30 2024 journal: 87bfb47beb93 mirroring state: enabled mirroring mode: journal mirroring global id: 0d488c59-cd44-47a8-86b7-c24509f7771b mirroring primary: true
On Thu, Feb 1, 2024 at 3:44 AM Eugen Block <eblock@nde.ag> wrote:
Hi,
have you compared the affected rbd images with working images? Maybe the mirroring failed for those images? Were they promoted correctly? Which mirror mode are you using, journal or snapshot? I would check the 'rbd info pool/image' output and compare to see if there's a difference.
Zitat von Satish Patel <satish.txt@gmail.com>:
> Folks, > > I have a ceph cluster and recently I configured rbd-mirror to replicate all > data to remove ceph cluster for disaster recovery. > > Yesterday for POC we did a hard cutover on ceph and point openstack to the > new cluster. All other vms came back up fine but 2 vms stuck in this error > state in libvirt logs > > 2024-01-31 22:44:37.591+0000: 474597: error : > qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute QEMU > command 'query-named-block-nodes': cannot read image start for probe: > Permission denied > > If it's a wider issue than it should impact all the vms but why only two VM > stuck and not starting and libvirt giving me this error in logs > > nova-compute logs also showing same error > > 2024-01-31 22:44:40.925 7 INFO nova.compute.manager [None > req-b33485b5-8740-48ae-8b5b-a440de3f11a4 c48fcfb9347f413f92fcece065644b00 > ca5c652478c7429e964257990800e9cb - - default default] [instance: > 2de0f880-77c7-4d2c-9e01-898c57ad3693] Successfully reverted task state from > powering-on on failure for instance. > 2024-01-31 22:44:40.944 7 ERROR oslo_messaging.rpc.server [None > req-b33485b5-8740-48ae-8b5b-a440de3f11a4 c48fcfb9347f413f92fcece065644b00 > ca5c652478c7429e964257990800e9cb - - default default] Exception during > message handling: libvirt.libvirtError: internal error: unable to execute > QEMU command 'query-named-block-nodes': cannot read image start for probe: > Permission denied