[nova][NFS] Changing disk permission on reboot, permission denied
Hi; I'm seeing a problem with a hard reboot operation, when /var/lib/nova/instances for the relevant compute node is on an NFS share exported from another compute node. I'm hoping someone here might be able to point to where to debug this next. 3 compute nodes: A, B and C. A exports its /var/lib/nova/instances over NFS v4. B and C mount that at /var/lib/nova/instances. (This is the kind of setup that I understand to be required for live migration to be possible.) If I create a VM on A, wait for it to be ACTIVE, then `nova reboot --hard testvm1`, it's fine, i.e. it reboots and returns to ACTIVE state. If I create a VM on B, wait for it to be ACTIVE, then `nova reboot --hard testvm2`, it goes permanently into ERROR state. The Nova compute log has Stderr: "qemu-img: Could not open '/var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk': Could not open '/var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk': Permission denied\n": nova.exception.InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/02bf258b-41bc-451a-9d3c-af6741c6060f/disk : Unexpected error while running command. and the ownership on that file is in fact now: -rw-r----- 1 root root 29491200 Apr 7 18:27 disk Before the reboot operation it was: -rw-r----- 1 libvirt-qemu kvm 29491200 Apr 7 18:25 disk There isn't a similar ownership change when rebooting a VM on node A, so presumably this is something to do with NFS being involved. But I believe I've followed standard NFS setup advice, and my user and group IDs are the same on all compute nodes: $ getent group nova nova:x:1003:libvirt-qemu $ getent passwd nova nova:x:1002:1003::/home/nova:/bin/bash $ getent passwd libvirt-qemu libvirt-qemu:x:64055:109:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin $ getent group kvm kvm:x:109:nova Any clues on how to debug or fix this next? Many thanks - Nell
On 4/7/25 20:52, Nell Jerram wrote:
and my user and group IDs are the same on all compute nodes:
$ getent group nova nova:x:1003:libvirt-qemu $ getent passwd nova nova:x:1002:1003::/home/nova:/bin/bash $ getent passwd libvirt-qemu libvirt-qemu:x:64055:109:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin $ getent group kvm kvm:x:109:nova
Any clues on how to debug or fix this next?
Many thanks - Nell
Hi Nell, I'm not really sure what your issue is (I never used the NFS for Nova), though you have to know that if you're using Debian or Ubuntu, the UIDs aren't what's in the standard. They should be: nova: 64060 cinder: 64061 glance: 64062 These have been reserved by James Page a long time ago, through the normal Debian procedure (James: what's the name of that ?!?), and cannot (should not?) be used by another application. I hope this helps, Cheers, Thomas Goirand (zigo)
On Tue, Apr 8, 2025 at 10:05 AM Thomas Goirand <zigo@debian.org> wrote:
On 4/7/25 20:52, Nell Jerram wrote:
and my user and group IDs are the same on all compute nodes:
$ getent group nova nova:x:1003:libvirt-qemu $ getent passwd nova nova:x:1002:1003::/home/nova:/bin/bash $ getent passwd libvirt-qemu libvirt-qemu:x:64055:109:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin $ getent group kvm kvm:x:109:nova
Any clues on how to debug or fix this next?
Many thanks - Nell
Hi Nell,
I'm not really sure what your issue is (I never used the NFS for Nova), though you have to know that if you're using Debian or Ubuntu, the UIDs aren't what's in the standard. They should be:
nova: 64060 cinder: 64061 glance: 64062
These have been reserved by James Page a long time ago, through the normal Debian procedure (James: what's the name of that ?!?), and cannot (should not?) be used by another application.
I hope this helps,
Cheers,
Thomas Goirand (zigo)
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers. Aha, my setup script creates the nova user with useradd -m -p password -s /bin/bash nova _before_ installing the nova packages. I wonder if this is indeed the problem... Best wishes - Nell
On 4/8/25 11:15, Nell Jerram wrote:
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers.
Aha, my setup script creates the nova user with
useradd -m -p password -s /bin/bash nova
_before_ installing the nova packages. I wonder if this is indeed the problem...
Best wishes - Nell
Well, I just had a quick look at the Ubuntu packages, and they are creating the UID / GID for you (see below). I'd suggest either doing what the package does in your script, or just let the package do it. Cheers, Thomas Goirand (zigo) #!/bin/sh -e NOVA_UID=64060 NOVA_GID=64060 if [ "$1" = "configure" ]; then if ! getent group nova > /dev/null 2>&1; then addgroup --quiet --system \ --gid $NOVA_GID nova 2>/dev/null fi if ! getent passwd nova > /dev/null 2>&1; then adduser --quiet --system \ --home /var/lib/nova \ --no-create-home \ --uid $NOVA_UID \ --gid $NOVA_GID \ --shell /usr/sbin/nologin nova 2>/dev/null fi if [ -z "$2" ]; then # New install - blanket permissions chown -R nova:nova /var/lib/nova/ fi chown nova:adm /var/log/nova chmod 0750 /var/log/nova
We do something similar when we add new OpenStack nodes (or reinstall them). Our storage backend is Ceph, and we use a CephFS mount for /var/lib/nova/instances on all compute nodes for live migration. Since we migrated last year from openSUSE to Ubuntu (package based deployment), we needed to preserve the ownership of /var/lib/nova/instances. Now we create the user nova (and cinder for /var/lib/cinder/conversion, which is also a CephFS mount) with the UID/GID from before Ubuntu, before installing the nova (and cinder) packages. I wonder if we could transition to the actual UID/GID from Ubuntu... I will have to think about that. Zitat von Thomas Goirand <zigo@debian.org>:
On 4/8/25 11:15, Nell Jerram wrote:
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers.
Aha, my setup script creates the nova user with
useradd -m -p password -s /bin/bash nova
_before_ installing the nova packages. I wonder if this is indeed the problem...
Best wishes - Nell
Well, I just had a quick look at the Ubuntu packages, and they are creating the UID / GID for you (see below). I'd suggest either doing what the package does in your script, or just let the package do it.
Cheers,
Thomas Goirand (zigo)
#!/bin/sh -e
NOVA_UID=64060 NOVA_GID=64060
if [ "$1" = "configure" ]; then if ! getent group nova > /dev/null 2>&1; then addgroup --quiet --system \ --gid $NOVA_GID nova 2>/dev/null fi
if ! getent passwd nova > /dev/null 2>&1; then adduser --quiet --system \ --home /var/lib/nova \ --no-create-home \ --uid $NOVA_UID \ --gid $NOVA_GID \ --shell /usr/sbin/nologin nova 2>/dev/null fi
if [ -z "$2" ]; then # New install - blanket permissions chown -R nova:nova /var/lib/nova/ fi
chown nova:adm /var/log/nova chmod 0750 /var/log/nova
On Tue, Apr 8, 2025 at 10:22 AM Thomas Goirand <zigo@debian.org> wrote:
On 4/8/25 11:15, Nell Jerram wrote:
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers.
Aha, my setup script creates the nova user with
useradd -m -p password -s /bin/bash nova
_before_ installing the nova packages. I wonder if this is indeed the problem...
Best wishes - Nell
Well, I just had a quick look at the Ubuntu packages, and they are creating the UID / GID for you (see below). I'd suggest either doing what the package does in your script, or just let the package do it.
Cheers,
Thomas Goirand (zigo)
#!/bin/sh -e
NOVA_UID=64060 NOVA_GID=64060
if [ "$1" = "configure" ]; then if ! getent group nova > /dev/null 2>&1; then addgroup --quiet --system \ --gid $NOVA_GID nova 2>/dev/null fi
if ! getent passwd nova > /dev/null 2>&1; then adduser --quiet --system \ --home /var/lib/nova \ --no-create-home \ --uid $NOVA_UID \ --gid $NOVA_GID \ --shell /usr/sbin/nologin nova 2>/dev/null fi
if [ -z "$2" ]; then # New install - blanket permissions chown -R nova:nova /var/lib/nova/ fi
chown nova:adm /var/log/nova chmod 0750 /var/log/nova
Unfortunately the problem is still happening in the same way with the Ubuntu packaging UIDs and GIDs. I will keep digging and report back! Best wishes - Nell
On Tue, Apr 8, 2025 at 5:41 PM Nell Jerram <nell@tigera.io> wrote:
On Tue, Apr 8, 2025 at 10:22 AM Thomas Goirand <zigo@debian.org> wrote:
On 4/8/25 11:15, Nell Jerram wrote:
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers.
Aha, my setup script creates the nova user with
useradd -m -p password -s /bin/bash nova
_before_ installing the nova packages. I wonder if this is indeed the problem...
Best wishes - Nell
Well, I just had a quick look at the Ubuntu packages, and they are creating the UID / GID for you (see below). I'd suggest either doing what the package does in your script, or just let the package do it.
Cheers,
Thomas Goirand (zigo)
#!/bin/sh -e
NOVA_UID=64060 NOVA_GID=64060
if [ "$1" = "configure" ]; then if ! getent group nova > /dev/null 2>&1; then addgroup --quiet --system \ --gid $NOVA_GID nova 2>/dev/null fi
if ! getent passwd nova > /dev/null 2>&1; then adduser --quiet --system \ --home /var/lib/nova \ --no-create-home \ --uid $NOVA_UID \ --gid $NOVA_GID \ --shell /usr/sbin/nologin nova 2>/dev/null fi
if [ -z "$2" ]; then # New install - blanket permissions chown -R nova:nova /var/lib/nova/ fi
chown nova:adm /var/log/nova chmod 0750 /var/log/nova
Unfortunately the problem is still happening in the same way with the Ubuntu packaging UIDs and GIDs.
I will keep digging and report back!
Best wishes - Nell
auditctl is a nice tool! (Thank you https://serverfault.com/questions/619722/how-do-i-detect-what-is-changing-fi... ) This is the audit entry for the operation that converts it to root ownership: time->Tue Apr 8 16:49:46 2025 type=PROCTITLE msg=audit(1744130986.261:218): proctitle="/usr/sbin/libvirtd" type=PATH msg=audit(1744130986.261:218): item=0 name="/var/lib/nova/instances/c63af63f-a5b4-44b6-af9d-b26c85f091b6/disk" inode=802497 dev=00:30 mode=0100640 ouid=64055 ogid=109 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1744130986.261:218): cwd="/" type=SYSCALL msg=audit(1744130986.261:218): arch=c000003e syscall=92 success=yes exit=0 a0=7556ac0bafb0 a1=0 a2=0 a3=0 items=1 ppid=46198 pid=49872 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="rpc-libvirtd" exe="/usr/sbin/libvirtd" subj=libvirtd key="njdisk" syscall 92 is chown, so that's libvirtd running as root and chowning, presumably to itself. That then led me to https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1784001 and https://netapp-openstack-dev.github.io/openstack-docs/ussuri/cinder/deployme..., and it appears that a solution is to add dynamic_ownership = 0 user = "nova" to /etc/libvirt/qemu.conf I don't yet feel confident that that is _the_ right solution, or know if it might cause other regressions in my testing, but this feels like progress. Best wishes - Nell
On Tue, Apr 8, 2025 at 6:28 PM Nell Jerram <nell@tigera.io> wrote:
On Tue, Apr 8, 2025 at 5:41 PM Nell Jerram <nell@tigera.io> wrote:
On Tue, Apr 8, 2025 at 10:22 AM Thomas Goirand <zigo@debian.org> wrote:
On 4/8/25 11:15, Nell Jerram wrote:
Many thanks for this Thomas. I am using OpenStack Caracal packages on Ubuntu Jammy (using cloud-archive:caracal), so I'm surprised I'm not seeing those numbers.
Aha, my setup script creates the nova user with
useradd -m -p password -s /bin/bash nova
_before_ installing the nova packages. I wonder if this is indeed the problem...
Best wishes - Nell
Well, I just had a quick look at the Ubuntu packages, and they are creating the UID / GID for you (see below). I'd suggest either doing what the package does in your script, or just let the package do it.
Cheers,
Thomas Goirand (zigo)
#!/bin/sh -e
NOVA_UID=64060 NOVA_GID=64060
if [ "$1" = "configure" ]; then if ! getent group nova > /dev/null 2>&1; then addgroup --quiet --system \ --gid $NOVA_GID nova 2>/dev/null fi
if ! getent passwd nova > /dev/null 2>&1; then adduser --quiet --system \ --home /var/lib/nova \ --no-create-home \ --uid $NOVA_UID \ --gid $NOVA_GID \ --shell /usr/sbin/nologin nova 2>/dev/null fi
if [ -z "$2" ]; then # New install - blanket permissions chown -R nova:nova /var/lib/nova/ fi
chown nova:adm /var/log/nova chmod 0750 /var/log/nova
Unfortunately the problem is still happening in the same way with the Ubuntu packaging UIDs and GIDs.
I will keep digging and report back!
Best wishes - Nell
auditctl is a nice tool! (Thank you https://serverfault.com/questions/619722/how-do-i-detect-what-is-changing-fi... )
This is the audit entry for the operation that converts it to root ownership:
time->Tue Apr 8 16:49:46 2025 type=PROCTITLE msg=audit(1744130986.261:218): proctitle="/usr/sbin/libvirtd" type=PATH msg=audit(1744130986.261:218): item=0 name="/var/lib/nova/instances/c63af63f-a5b4-44b6-af9d-b26c85f091b6/disk" inode=802497 dev=00:30 mode=0100640 ouid=64055 ogid=109 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1744130986.261:218): cwd="/" type=SYSCALL msg=audit(1744130986.261:218): arch=c000003e syscall=92 success=yes exit=0 a0=7556ac0bafb0 a1=0 a2=0 a3=0 items=1 ppid=46198 pid=49872 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="rpc-libvirtd" exe="/usr/sbin/libvirtd" subj=libvirtd key="njdisk"
syscall 92 is chown, so that's libvirtd running as root and chowning, presumably to itself.
That then led me to https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1784001 and https://netapp-openstack-dev.github.io/openstack-docs/ussuri/cinder/deployme..., and it appears that a solution is to add
dynamic_ownership = 0 user = "nova"
to /etc/libvirt/qemu.conf
I don't yet feel confident that that is _the_ right solution, or know if it might cause other regressions in my testing, but this feels like progress.
Best wishes - Nell
Just to confirm, adding dynamic_ownership = 0 user = "nova" group = "nova" does fix the problem that I was seeing. (And I've also reorganised so as to use the standardized UIDs/GIDs.) Best wishes - Nell
On 4/9/25 16:32, Nell Jerram wrote:
Just to confirm, adding
dynamic_ownership = 0 user = "nova" group = "nova"
does fix the problem that I was seeing. (And I've also reorganised so as to use the standardized UIDs/GIDs.)
Best wishes - Nell
You also need to make sure that libvirt is in the nova group. Cheers, Thomas Goirand (zigo)
On Wed, Apr 9, 2025 at 3:37 PM Thomas Goirand <zigo@debian.org> wrote:
On 4/9/25 16:32, Nell Jerram wrote:
Just to confirm, adding
dynamic_ownership = 0 user = "nova" group = "nova"
does fix the problem that I was seeing. (And I've also reorganised so as to use the standardized UIDs/GIDs.)
Best wishes - Nell
You also need to make sure that libvirt is in the nova group.
Cheers,
Thomas Goirand (zigo)
Thanks Thomas. Do you mean this? $ sudo getent group nova nova:x:64060:libvirt-qemu $ sudo groups libvirt-qemu libvirt-qemu : kvm nova libvirt-qemu Best wishes - Nell
participants (3)
-
Eugen Block
-
Nell Jerram
-
Thomas Goirand