Re: [nova][libvirtd] internal error: unable to execute QEMU command 'query-named-block-nodes'

1 Feb 2024

      Okay, glad you brought it back. I would be curious as well to  
understand what happened.

Zitat von Satish Patel <satish.txt@gmail.com>:
...
Hi Eugen,
After shelve and unshelve bring VM back to life. This is very odd and I
haven't seen this behavior before.
On Thu, Feb 1, 2024 at 11:24 AM Eugen Block <eblock@nde.ag> wrote:
...
I’m not sure if I understand all of it, but there currently is only
one cluster active? And that’s where this output is from? What does
‘rbd status’ tell you?
Zitat von Satish Patel <satish.txt@gmail.com>:
...
Older ceph cluster is down because everything came up so we shut down
entire cluster and realized one vms stuck in this error state.. in
current
cluster this is what its showing
# rbd info -p volumes volume-77b123ff-915f-4e0b-8d74-d34fde12528b
rbd image 'volume-77b123ff-915f-4e0b-8d74-d34fde12528b':
size 120 GiB in 30720 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 87bfb47beb93
block_name_prefix: rbd_data.87bfb47beb93
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten,
journaling
op_features:
flags:
create_timestamp: Sun Jan 28 05:28:30 2024
access_timestamp: Thu Feb  1 15:28:57 2024
modify_timestamp: Thu Feb  1 06:17:30 2024
journal: 87bfb47beb93
mirroring state: enabled
mirroring mode: journal
mirroring global id: 0d488c59-cd44-47a8-86b7-c24509f7771b
mirroring primary: true
On Thu, Feb 1, 2024 at 3:44 AM Eugen Block <eblock@nde.ag> wrote:
...
Hi,
have you compared the affected rbd images with working images? Maybe
the mirroring failed for those images? Were they promoted correctly?
Which mirror mode are you using, journal or snapshot? I would check
the 'rbd info pool/image' output and compare to see if there's a
difference.
Zitat von Satish Patel <satish.txt@gmail.com>:
...
Folks,
I have a ceph cluster and recently I configured rbd-mirror to
replicate
all
data to remove ceph cluster for disaster recovery.
Yesterday for POC we did a hard cutover on ceph and point openstack to
the
new cluster. All other vms came back up fine but 2 vms stuck in this
error
state in libvirt logs
2024-01-31 22:44:37.591+0000: 474597: error :
qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute
QEMU
command 'query-named-block-nodes': cannot read image start for probe:
Permission denied
If it's a wider issue than it should impact all the vms but why only
two
VM
stuck and not starting and libvirt giving me this error in logs
nova-compute logs also showing same error
2024-01-31 22:44:40.925 7 INFO nova.compute.manager [None
req-b33485b5-8740-48ae-8b5b-a440de3f11a4
c48fcfb9347f413f92fcece065644b00
ca5c652478c7429e964257990800e9cb - - default default] [instance:
2de0f880-77c7-4d2c-9e01-898c57ad3693] Successfully reverted task state
from
powering-on on failure for instance.
2024-01-31 22:44:40.944 7 ERROR oslo_messaging.rpc.server [None
req-b33485b5-8740-48ae-8b5b-a440de3f11a4
c48fcfb9347f413f92fcece065644b00
ca5c652478c7429e964257990800e9cb - - default default] Exception during
message handling: libvirt.libvirtError: internal error: unable to
execute
QEMU command 'query-named-block-nodes': cannot read image start for
probe:
Permission denied