Hi Christian, I have a follow-up question. Do you also have unpurgeable snapshots in the trash namespace when you tried to delete an image which had children but failed due to missing glance permissions? This is in Victoria version: storage01:~ # rbd snap ls images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 --all SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 159 1a97db13-307e-4820-8dc2-8549e9ba1ad7 39 MiB Thu Dec 14 08:29:56 2023 trash (snap) Removing this trash snapshot is not possible because it is protected: storage01:~ # rbd snap rm --snap-id 159 images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 rbd: snapshot id 159 is protected from removal. I found a thread [4] which suggested to check the omapvals for rbd_trash but it's empty in my case: storage01:~ # rados -p images listomapvals rbd_trash storage01:~ # Apparently, in newer versions this has changed as well, in my one-node test cluster (Antelope) trying to remove such an image does not leave a snapshot in the trash namespace. Thanks, Eugen [4] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MINEIF5ZBTUO... Zitat von Christian Rohmann <christian.rohmann@inovex.de>:
Thanks for your responses!
On 13.12.23 18:40, Jonathan Rosser wrote:
Hi Christain,
If you dig through the various deployment tooling then you'll find things like https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/...
Yes indeed. Once you know which technical terms to search for you'll see these kind of configurations "all over":
Charm * https://review.opendev.org/q/topic:%22bug/1696073%22 * https://bugs.launchpad.net/charm-glance/+bug/1696073
RedHat Ceph Config guide on OpenStack clients: * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce...
....
On 14.12.23 13:16, Erno Kuvaja wrote:
On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:
Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.
This sounds like regression to me.
We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data.
See my references above. It's nice to have the various deployment tools and methods all "fix" this, but first and foremost this needs to be properly documented in the source documentation of Glance, Cinder and Nova.
I wonder why there are no unit tests that fail because of this? Looking at what devstack does at [1] it appears that
a) it actually applies "allow class-read object_prefix rbd_children", which is not what is currently documented in the setup guide(s) (see [2] and [3]) b) it unnecessarily grants read permissions to NOVA_CEPH_POOL ("vms") and CINDER_CEPH_POOL ("volumes") also for the Glance user c) does NOT use the managed capabilities called "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", see [4]. This also differs in the Cinder / Glance documentation and makes a great difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.
I suggest to update the documentation and also the devstack plugin to properly use the agreed best practices. Maybe it makes sense to move the conversion and bugfixes to the Launchpad bug I raised about this issue [5] ?
Regards
Christian
[1] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [2] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [3] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [4] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [5] https://bugs.launchpad.net/glance/+bug/2045158