Hi; I'm seeing a problem with a hard reboot operation, when /var/lib/nova/instances for the relevant compute node is on an NFS share exported from another compute node. I'm hoping someone here might be able to point to where to debug this next.

3 compute nodes: A, B and C.

A exports its /var/lib/nova/instances over NFS v4.

B and C mount that at /var/lib/nova/instances.

(This is the kind of setup that I understand to be required for live migration to be possible.)

If I create a VM on A, wait for it to be ACTIVE, then `nova reboot --hard testvm1`, it's fine, i.e. it reboots and returns to ACTIVE state.

If I create a VM on B, wait for it to be ACTIVE, then `nova reboot --hard testvm2`, it goes permanently into ERROR state. The Nova compute log has

Stderr: "qemu-img: Could not open '/var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk': Could not open '/var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk': Permission denied\n": nova.exception.InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk : Unexpected error while running command.

and the ownership on that file is in fact now:

-rw-r----- 1 root root 29491200 Apr 7 18:27 disk

Before the reboot operation it was:

-rw-r----- 1 libvirt-qemu kvm 29491200 Apr 7 18:25 disk

There isn't a similar ownership change when rebooting a VM on node A, so presumably this is something to do with NFS being involved. But I believe I've followed standard NFS setup advice, and my user and group IDs are the same on all compute nodes:

$ getent group nova
nova:x:1003:libvirt-qemu
$ getent passwd nova
nova:x:1002:1003::/home/nova:/bin/bash
$ getent passwd libvirt-qemu
libvirt-qemu:x:64055:109:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin
$ getent group kvm
kvm:x:109:nova

Any clues on how to debug or fix this next?

Many thanks - Nell