What are the correct auth caps for Ceph RBD clients of Cinder / Glance / Nova

newer
Cinder Bug Report 2024-01-31

older
[cinder] This week's meeting will...

Christian Rohmann

13 Dec 2023 13 Dec '23

8:30 p.m.

Hey openstack-discuss, I am a little confused about correct and required Ceph authx permissions for the RBD clients in Cinder, Glance and also Nova: When Glance is requested to delete an image it will check if this image has depended children, see https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b9.... The children of Glance images usually are (Cinder) volumes, which therefore live in a different RBD pool "volumes". But if such children do exist a 500 error is thrown by Glance API. There also is an bug about this issue on Launchpad [3]. Manually using the RBD client shows the same error:

...

# rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children $IMAGE_ID

2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: failed to retrieve name: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 0x5639fdd5af60 failed to open image: (1) Operation not permitted rbd: listing children failed: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: list_descendants: failed to open descendant b7078ed7ace50d from pool instances:(1) Operation not permitted

So it's a permission error. Following either the documentation of Glance [1] or Ceph [2] on configuring the ceph auth caps there is no mention of granting anything towards the volume pool to Glance. So this is what I currently have configured:

...

client.cinder         key: REACTED         caps: [mgr] profile rbd pool=volumes, profile rbd-read-only pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=volumes, profile rbd-read-only pool=images

client.glance         key: REACTED         caps: [mgr] profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=images

   client.nova         key: REACTED         caps: [mgr] profile rbd pool=instances, profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=instances, profile rbd pool=images

When granting the glance client e.g. "rbd-read-only" to the volumes pool via:

...

# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, profile rbd-read-only pool=volumes'

the error is gone. I am wondering through if this is really just a documentation bug (at OpenStack AND Ceph equally) and if Glance really needs read-only on the whole volumes pool or if there is some other capability that covers asking for child images. All in all I am simply wondering what the correct and least-privilege ceph auth caps for the RBD clients in Cinder, Glance and Nova would look like. Thanks Christian [1] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [2] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [3] https://bugs.launchpad.net/glance/+bug/2045158

Show replies by date

Jonathan Rosser

13 Dec 13 Dec

9:10 p.m.

Hi Christain, If you dig through the various deployment tooling then you'll find things like https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/... Hope this is helpful, Jonathan. On 13/12/2023 17:00, Christian Rohmann wrote:

...

Hey openstack-discuss,

I am a little confused about correct and required Ceph authx permissions for the RBD clients in Cinder, Glance and also Nova:

When Glance is requested to delete an image it will check if this image has depended children, see https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b9.... The children of Glance images usually are (Cinder) volumes, which therefore live in a different RBD pool "volumes". But if such children do exist a 500 error is thrown by Glance API. There also is an bug about this issue on Launchpad [3].

Manually using the RBD client shows the same error:

...
# rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children $IMAGE_ID

2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: failed to retrieve name: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 0x5639fdd5af60 failed to open image: (1) Operation not permitted rbd: listing children failed: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: list_descendants: failed to open descendant b7078ed7ace50d from pool instances:(1) Operation not permitted

So it's a permission error. Following either the documentation of Glance [1] or Ceph [2] on configuring the ceph auth caps there is no mention of granting anything towards the volume pool to Glance. So this is what I currently have configured:

...
client.cinder         key: REACTED         caps: [mgr] profile rbd pool=volumes, profile rbd-read-only pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=volumes, profile rbd-read-only pool=images

client.glance         key: REACTED         caps: [mgr] profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=images

   client.nova         key: REACTED         caps: [mgr] profile rbd pool=instances, profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=instances, profile rbd pool=images

When granting the glance client e.g. "rbd-read-only" to the volumes pool via:

...
# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, profile rbd-read-only pool=volumes'

the error is gone.

I am wondering through if this is really just a documentation bug (at OpenStack AND Ceph equally) and if Glance really needs read-only on the whole volumes pool or if there is some other capability that covers asking for child images.

All in all I am simply wondering what the correct and least-privilege ceph auth caps for the RBD clients in Cinder, Glance and Nova would look like.

Thanks

Christian

[1] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [2] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [3] https://bugs.launchpad.net/glance/+bug/2045158

Christian Rohmann

14 Dec 14 Dec

6:38 p.m.

Thanks for your responses! On 13.12.23 18:40, Jonathan Rosser wrote:

...

Hi Christain,

If you dig through the various deployment tooling then you'll find things like https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/...

Yes indeed. Once you know which technical terms to search for you'll see these kind of configurations "all over": Charm * https://review.opendev.org/q/topic:%22bug/1696073%22 * https://bugs.launchpad.net/charm-glance/+bug/1696073 RedHat Ceph Config guide on OpenStack clients: * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce... .... On 14.12.23 13:16, Erno Kuvaja wrote:

...

On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data.

See my references above. It's nice to have the various deployment tools and methods all "fix" this, but first and foremost this needs to be properly documented in the source documentation of Glance, Cinder and Nova. I wonder why there are no unit tests that fail because of this? Looking at what devstack does at [1] it appears that a) it actually applies "allow class-read object_prefix rbd_children", which is not what is currently documented in the setup guide(s) (see [2] and [3]) b) it unnecessarily grants read permissions to NOVA_CEPH_POOL ("vms") and CINDER_CEPH_POOL ("volumes") also for the Glance user c) does NOT use the managed capabilities called "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", see [4]. This also differs in the Cinder / Glance documentation and makes a great difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks. I suggest to update the documentation and also the devstack plugin to properly use the agreed best practices. Maybe it makes sense to move the conversion and bugfixes to the Launchpad bug I raised about this issue [5] ? Regards Christian [1] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [2] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [3] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [4] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [5] https://bugs.launchpad.net/glance/+bug/2045158

Eugen Block

15 Dec 15 Dec

1:45 p.m.

Hi Christian, I have a follow-up question. Do you also have unpurgeable snapshots in the trash namespace when you tried to delete an image which had children but failed due to missing glance permissions? This is in Victoria version: storage01:~ # rbd snap ls images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 --all SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 159 1a97db13-307e-4820-8dc2-8549e9ba1ad7 39 MiB Thu Dec 14 08:29:56 2023 trash (snap) Removing this trash snapshot is not possible because it is protected: storage01:~ # rbd snap rm --snap-id 159 images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 rbd: snapshot id 159 is protected from removal. I found a thread [4] which suggested to check the omapvals for rbd_trash but it's empty in my case: storage01:~ # rados -p images listomapvals rbd_trash storage01:~ # Apparently, in newer versions this has changed as well, in my one-node test cluster (Antelope) trying to remove such an image does not leave a snapshot in the trash namespace. Thanks, Eugen [4] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MINEIF5ZBTUO... Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...

Thanks for your responses!

On 13.12.23 18:40, Jonathan Rosser wrote:

...
Hi Christain,

If you dig through the various deployment tooling then you'll find things like https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/...

Yes indeed. Once you know which technical terms to search for you'll see these kind of configurations "all over":

Charm * https://review.opendev.org/q/topic:%22bug/1696073%22 * https://bugs.launchpad.net/charm-glance/+bug/1696073

RedHat Ceph Config guide on OpenStack clients: * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce...

....

On 14.12.23 13:16, Erno Kuvaja wrote:

...
On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data.

See my references above. It's nice to have the various deployment tools and methods all "fix" this, but first and foremost this needs to be properly documented in the source documentation of Glance, Cinder and Nova.

I wonder why there are no unit tests that fail because of this? Looking at what devstack does at [1] it appears that

a) it actually applies "allow class-read object_prefix rbd_children", which is not what is currently documented in the setup guide(s) (see [2] and [3]) b) it unnecessarily grants read permissions to NOVA_CEPH_POOL ("vms") and CINDER_CEPH_POOL ("volumes") also for the Glance user c) does NOT use the managed capabilities called "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", see [4]. This also differs in the Cinder / Glance documentation and makes a great difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.

I suggest to update the documentation and also the devstack plugin to properly use the agreed best practices. Maybe it makes sense to move the conversion and bugfixes to the Launchpad bug I raised about this issue [5] ?

Regards

Christian

[1] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [2] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [3] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [4] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [5] https://bugs.launchpad.net/glance/+bug/2045158

Eugen Block

4:17 p.m.

Nevermind, there was another clone based on that snapshot. Removing the clone fixed the trash snapshot. Zitat von Eugen Block <eblock@nde.ag>:

...

Hi Christian,

I have a follow-up question. Do you also have unpurgeable snapshots in the trash namespace when you tried to delete an image which had children but failed due to missing glance permissions? This is in Victoria version:

storage01:~ # rbd snap ls images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 --all SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 159 1a97db13-307e-4820-8dc2-8549e9ba1ad7 39 MiB Thu Dec 14 08:29:56 2023 trash (snap)

Removing this trash snapshot is not possible because it is protected:

storage01:~ # rbd snap rm --snap-id 159 images/278ffe2b-67a7-40d0-87b7-903f2fc9c3b4 rbd: snapshot id 159 is protected from removal.

I found a thread [4] which suggested to check the omapvals for rbd_trash but it's empty in my case:

storage01:~ # rados -p images listomapvals rbd_trash storage01:~ #

Apparently, in newer versions this has changed as well, in my one-node test cluster (Antelope) trying to remove such an image does not leave a snapshot in the trash namespace.

Thanks, Eugen

[4] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MINEIF5ZBTUO...

Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...
Thanks for your responses!

On 13.12.23 18:40, Jonathan Rosser wrote:

...
Hi Christain,

If you dig through the various deployment tooling then you'll find things like https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/...

Yes indeed. Once you know which technical terms to search for you'll see these kind of configurations "all over":

Charm * https://review.opendev.org/q/topic:%22bug/1696073%22 * https://bugs.launchpad.net/charm-glance/+bug/1696073

RedHat Ceph Config guide on OpenStack clients: * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce...

....

On 14.12.23 13:16, Erno Kuvaja wrote:

...
On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data.

See my references above. It's nice to have the various deployment tools and methods all "fix" this, but first and foremost this needs to be properly documented in the source documentation of Glance, Cinder and Nova.

I wonder why there are no unit tests that fail because of this? Looking at what devstack does at [1] it appears that

a) it actually applies "allow class-read object_prefix rbd_children", which is not what is currently documented in the setup guide(s) (see [2] and [3]) b) it unnecessarily grants read permissions to NOVA_CEPH_POOL ("vms") and CINDER_CEPH_POOL ("volumes") also for the Glance user c) does NOT use the managed capabilities called "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", see [4]. This also differs in the Cinder / Glance documentation and makes a great difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.

I suggest to update the documentation and also the devstack plugin to properly use the agreed best practices. Maybe it makes sense to move the conversion and bugfixes to the Launchpad bug I raised about this issue [5] ?

Regards

Christian

[1] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [2] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [3] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [4] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [5] https://bugs.launchpad.net/glance/+bug/2045158

KEREM CELIKER

18 Dec 18 Dec

2:27 p.m.

Hey Christian, The issue you're encountering with Glance and RBD permissions can indeed be tricky to resolve. Let's break it down together: 1. **Glance and RBD Permissions**: - When Glance interacts with RBD (Ceph's block storage), it needs the appropriate permissions to perform operations like deleting images. - The error you're seeing, "Operation not permitted," indicates a permission issue. 2. **Children of Glance Images**: - Glance images can have dependent children, which are typically Cinder volumes. These volumes reside in a different RBD pool called "volumes." - When Glance tries to delete an image, it checks if any dependent children exist. If they do, Glance should handle this gracefully. 3. **Your Current Configuration**: - Let's review your current Ceph auth caps configuration for the relevant clients: - client.cinder: Has read-only access to the "images" pool and read access to the "volumes" pool. - client.glance: Has read access to the "images" pool. - client.nova: Has read access to both the "instances" and "images" pools. 4. **Missing Permissions**: - The issue lies in the Glance configuration. Glance needs read access to the "volumes" pool to handle dependent children correctly. - Update the Glance configuration as follows: - client.glance: - Add the necessary permission for the "volumes" pool: ceph auth caps client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=glance-images' 5. **Explanation**: - The added permission allows Glance to read the children (dependent volumes) from the "volumes" pool. - It's more restrictive than allowing full access (allow *), which aligns with your desire for tighter security. Remember to apply these changes and restart the relevant services as you know.. If you encounter any further issues, feel free to ask us Kerem ÇELİKER Head of Cloud Architecture linkedin.com/in/keremceliker/

Christian Rohmann

19 Dec 19 Dec

2:07 p.m.

Hello Kerem, thanks for your complete and well structured reply. On 18.12.23 11:57, KEREM CELIKER wrote:

...

Hey Christian,

The issue you're encountering with Glance and RBD permissions can indeed be tricky to resolve. Let's break it down together:

1. **Glance and RBD Permissions**: - When Glance interacts with RBD (Ceph's block storage), it needs the appropriate permissions to perform operations like deleting images. - The error you're seeing, "Operation not permitted," indicates a permission issue.

Yes, but I was simply following the documentation, that's why I started the thread in the first place - to determine what needs to actually go into the documentation.

...

2. **Children of Glance Images**: - Glance images can have dependent children, which are typically Cinder volumes. These volumes reside in a different RBD pool called "volumes." - When Glance tries to delete an image, it checks if any dependent children exist. If they do, Glance should handle this gracefully.

Yes. The "funny" thing is, that Ceph (RBD) only replies with the permission error in case there actually are children (in pools the glance user does not have permission for). But it then does not gracefully return a message about existing children, but replies with a 500 (due to the failure querying for children). So there is no harm done, but the condition is not handled as intended: Notifying the user about existing children.

...

3. **Your Current Configuration**: - Let's review your current Ceph auth caps configuration for the relevant clients: - client.cinder: Has read-only access to the "images" pool and read access to the "volumes" pool. - client.glance: Has read access to the "images" pool. - client.nova: Has read access to both the "instances" and "images" pools.

My current config is simply "inspired" by the current documentation with the change to managed capabilities called "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx". See https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat.... This also differs in the Cinder / Glance documentation. Without those the blacklisting of RBD locks does not work, creating other side-effects. See below for my motivation to discuss this topic and then push some changes towards the documentation.

...

4. **Missing Permissions**: - The issue lies in the Glance configuration. Glance needs read access to the "volumes" pool to handle dependent children correctly. - Update the Glance configuration as follows: - client.glance: - Add the necessary permission for the "volumes" pool: ceph auth caps client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=glance-images'

5. **Explanation**: - The added permission allows Glance to read the children (dependent volumes) from the "volumes" pool. - It's more restrictive than allowing full access (allow *), which aligns with your desire for tighter security.

Thanks again. My intention is not (just) to make my setup work, but to discuss and document the proper permissions and to then have them reflected correctly in the installation guides. It makes no sense for everyone having to figure this out individually. Regards Christian

Eugen Block

14 Dec 14 Dec

11:10 a.m.

Hi, you're right about the permission error when running 'rbd children' as the glance user. However, this didn't come up in our cloud environments yet because glance first checks for existing snapshots, which usually exist and are protected: 2023-12-14 07:54:59.493 2895 WARNING glance_store._drivers.rbd [req-f57f0688-aff9-46b8-90d6-f385438cfb8f aa7c830654a64ce2a0a511a5959c5ca1 17d254c1c283409c94559d588e17b703 - default default] Remove image 3b04672f-5e27-447c-965f-878a4c8b1aa9 failed. It has snapshot(s) left.: rbd.ImageHasSnapshots: [errno 39] RBD image has snapshots (error removing image) The image cannot be deleted because it has snapshot(s).: 409 Conflict Failed to delete 1 of 1 images. Has this behavior changed in newer releases? Because we still run on Victoria. But if it has changed it sounds reasonable to have readonly permissions for glance, I guess. Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...

Hey openstack-discuss,

I am a little confused about correct and required Ceph authx permissions for the RBD clients in Cinder, Glance and also Nova:

When Glance is requested to delete an image it will check if this image has depended children, see https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b9.... The children of Glance images usually are (Cinder) volumes, which therefore live in a different RBD pool "volumes". But if such children do exist a 500 error is thrown by Glance API. There also is an bug about this issue on Launchpad [3].

Manually using the RBD client shows the same error:

...
# rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children $IMAGE_ID

2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: failed to retrieve name: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 0x5639fdd5af60 failed to open image: (1) Operation not permitted rbd: listing children failed: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: list_descendants: failed to open descendant b7078ed7ace50d from pool instances:(1) Operation not permitted

So it's a permission error. Following either the documentation of Glance [1] or Ceph [2] on configuring the ceph auth caps there is no mention of granting anything towards the volume pool to Glance. So this is what I currently have configured:

...
client.cinder         key: REACTED         caps: [mgr] profile rbd pool=volumes, profile rbd-read-only pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=volumes, profile rbd-read-only pool=images

client.glance         key: REACTED         caps: [mgr] profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=images

   client.nova         key: REACTED         caps: [mgr] profile rbd pool=instances, profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=instances, profile rbd pool=images

When granting the glance client e.g. "rbd-read-only" to the volumes pool via:

...
# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, profile rbd-read-only pool=volumes'

the error is gone.

I am wondering through if this is really just a documentation bug (at OpenStack AND Ceph equally) and if Glance really needs read-only on the whole volumes pool or if there is some other capability that covers asking for child images.

All in all I am simply wondering what the correct and least-privilege ceph auth caps for the RBD clients in Cinder, Glance and Nova would look like.

Thanks

Christian

[1] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [2] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [3] https://bugs.launchpad.net/glance/+bug/2045158

Eugen Block

1:56 p.m.

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance. Zitat von Eugen Block <eblock@nde.ag>:

...

Hi,

you're right about the permission error when running 'rbd children' as the glance user. However, this didn't come up in our cloud environments yet because glance first checks for existing snapshots, which usually exist and are protected:

2023-12-14 07:54:59.493 2895 WARNING glance_store._drivers.rbd [req-f57f0688-aff9-46b8-90d6-f385438cfb8f aa7c830654a64ce2a0a511a5959c5ca1 17d254c1c283409c94559d588e17b703 - default default] Remove image 3b04672f-5e27-447c-965f-878a4c8b1aa9 failed. It has snapshot(s) left.: rbd.ImageHasSnapshots: [errno 39] RBD image has snapshots (error removing image) The image cannot be deleted because it has snapshot(s).: 409 Conflict Failed to delete 1 of 1 images.

Has this behavior changed in newer releases? Because we still run on Victoria. But if it has changed it sounds reasonable to have readonly permissions for glance, I guess.

Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...
Hey openstack-discuss,

I am a little confused about correct and required Ceph authx permissions for the RBD clients in Cinder, Glance and also Nova:

When Glance is requested to delete an image it will check if this image has depended children, see https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b9.... The children of Glance images usually are (Cinder) volumes, which therefore live in a different RBD pool "volumes". But if such children do exist a 500 error is thrown by Glance API. There also is an bug about this issue on Launchpad [3].

Manually using the RBD client shows the same error:

...
# rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children $IMAGE_ID

2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: failed to retrieve name: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 0x5639fdd5af60 failed to open image: (1) Operation not permitted rbd: listing children failed: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: list_descendants: failed to open descendant b7078ed7ace50d from pool instances:(1) Operation not permitted

So it's a permission error. Following either the documentation of Glance [1] or Ceph [2] on configuring the ceph auth caps there is no mention of granting anything towards the volume pool to Glance. So this is what I currently have configured:

...
client.cinder         key: REACTED         caps: [mgr] profile rbd pool=volumes, profile rbd-read-only pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=volumes, profile rbd-read-only pool=images

client.glance         key: REACTED         caps: [mgr] profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=images

   client.nova         key: REACTED         caps: [mgr] profile rbd pool=instances, profile rbd pool=images         caps: [mon] profile rbd         caps: [osd] profile rbd pool=instances, profile rbd pool=images

When granting the glance client e.g. "rbd-read-only" to the volumes pool via:

...
# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, profile rbd-read-only pool=volumes'

the error is gone.

I am wondering through if this is really just a documentation bug (at OpenStack AND Ceph equally) and if Glance really needs read-only on the whole volumes pool or if there is some other capability that covers asking for child images.

All in all I am simply wondering what the correct and least-privilege ceph auth caps for the RBD clients in Cinder, Glance and Nova would look like.

Thanks

Christian

[1] https://docs.openstack.org/glance/latest/configuration/configuring.html#conf... [2] https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent... [3] https://bugs.launchpad.net/glance/+bug/2045158

Erno Kuvaja

3:46 p.m.

On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

...

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data. - jokke

...

Zitat von Eugen Block <eblock@nde.ag>:

...
Hi,

you're right about the permission error when running 'rbd children' as the glance user. However, this didn't come up in our cloud environments yet because glance first checks for existing snapshots, which usually exist and are protected:

2023-12-14 07:54:59.493 2895 WARNING glance_store._drivers.rbd [req-f57f0688-aff9-46b8-90d6-f385438cfb8f aa7c830654a64ce2a0a511a5959c5ca1 17d254c1c283409c94559d588e17b703 - default default] Remove image 3b04672f-5e27-447c-965f-878a4c8b1aa9 failed. It has snapshot(s) left.: rbd.ImageHasSnapshots: [errno 39] RBD image has snapshots (error removing image) The image cannot be deleted because it has snapshot(s).: 409 Conflict Failed to delete 1 of 1 images.

Has this behavior changed in newer releases? Because we still run on Victoria. But if it has changed it sounds reasonable to have readonly permissions for glance, I guess.

Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...
Hey openstack-discuss,

I am a little confused about correct and required Ceph authx permissions for the RBD clients in Cinder, Glance and also Nova:

When Glance is requested to delete an image it will check if this image has depended children, see

https://opendev.org/openstack/glance_store/src/commit/6f5011d1f05c99894fb8b9... .

...
The children of Glance images usually are (Cinder) volumes, which therefore live in a different RBD pool "volumes". But if such children do exist a 500 error is thrown by Glance API. There also is an bug about this issue on Launchpad [3].

Manually using the RBD client shows the same error:

...
# rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children $IMAGE_ID

2023-12-13T16:51:48.131+0000 7f198cf4e640 -1 librbd::image::OpenRequest: failed to retrieve name: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f198d74f640 -1 librbd::ImageState: 0x5639fdd5af60 failed to open image: (1) Operation not permitted rbd: listing children failed: (1) Operation not permitted 2023-12-13T16:51:48.131+0000 7f1990c474c0 -1 librbd::api::Image: list_descendants: failed to open descendant b7078ed7ace50d from pool instances:(1) Operation not permitted

So it's a permission error. Following either the documentation of Glance [1] or Ceph [2] on configuring the ceph auth caps there is no mention of granting anything towards the volume pool to Glance. So this is what I currently have configured:

...
client.cinder key: REACTED caps: [mgr] profile rbd pool=volumes, profile rbd-read-only pool=images caps: [mon] profile rbd caps: [osd] profile rbd pool=volumes, profile rbd-read-only pool=images

client.glance key: REACTED caps: [mgr] profile rbd pool=images caps: [mon] profile rbd caps: [osd] profile rbd pool=images

client.nova key: REACTED caps: [mgr] profile rbd pool=instances, profile rbd pool=images caps: [mon] profile rbd caps: [osd] profile rbd pool=instances, profile rbd pool=images

When granting the glance client e.g. "rbd-read-only" to the volumes pool via:

...
# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images, profile rbd-read-only pool=volumes' mgr 'profile rbd pool=images, profile rbd-read-only pool=volumes'

the error is gone.

I am wondering through if this is really just a documentation bug (at OpenStack AND Ceph equally) and if Glance really needs read-only on the whole volumes pool or if there is some other capability that covers asking for child images.

All in all I am simply wondering what the correct and least-privilege ceph auth caps for the RBD clients in Cinder, Glance and Nova would look like.

Thanks

Christian

[1]

https://docs.openstack.org/glance/latest/configuration/configuring.html#conf...

...
[2]

https://docs.ceph.com/en/latest/rbd/rbd-openstack/#setup-ceph-client-authent...

...
[3] https://bugs.launchpad.net/glance/+bug/2045158

Christian Rohmann

20 Dec 20 Dec

6:33 p.m.

Hello again. On 14.12.23 13:16, Erno Kuvaja wrote:

...

On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

Maybe, see below.

...

We debated about it a lot when Ceph broke their backwards compatibility on deletes and I'm pretty sure, if my memory serves me right, that we found a solution in the Ceph store driver to not need the permissions to other pools. There really is no excuse why Glance should have read access to volume data or Nova Ephemeral data.

In preparation to write up some PRs for the documentation, I dug a little deeper and made the following observations: a) Updating the ceph auth caps of the Glance user to ceph auth caps client.glance mon 'profile rbd' mgr 'profile rbd pool=images' osd 'allow class-read object_prefix rbd_children, profile rbd pool=images' as is used by ceph-ansible [1] and other deployers does NOT fix the issue with listing children for images: --- cut --- # rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children 85ffc293-6f9f-4cba-b75a-38d9b26eb0e3 rbd: listing children failed: (1) Operation not permitted 2023-12-20T14:49:52.845+0000 7ff93ea544c0 -1 librbd::api::Image: list_images_v2: error listing image in directory: (1) Operation not permitted 2023-12-20T14:49:52.845+0000 7ff93ea544c0 -1 librbd::api::Image: list_descendants: error listing v2 images: (1) Operation not permitted --- cut --- b) While Ceph does indeed document rados objects with prefix "rbd_children" at [2] in regards to parent child relationships of images, it seems to now be enough to satisfy the rados operations the list_children method requires. Adding rbd_directory and rbd_trash via --- cut --- # ceph auth caps client.glance mon 'profile rbd' mgr 'profile rbd pool=images' osd 'allow class-read object_prefix rbd_directory, allow class-read object_prefix rbd_trash, profile rbd pool=images' updated caps for client.glance --- cut --- does fix this though --- cut --- # rbd -n client.glance -k /etc/ceph/ceph.client.glance.keyring -p images children 85ffc293-6f9f-4cba-b75a-38d9b26eb0e3 volumes/volume-46481ff8-1b9e-4215-8b63-62f8d996fecc --- cut --- c) As far as the regression goes, I believe this could be due to the list_children method being updated over the releases, now fetching and returning more info on the children? See [3] and [4] for those changes. d) But any OpenStack deployment still with read-access on the volumes pool will not observe this issue. Also Glance API responding to the image delete request with a 500 and not 400 error is not really a big issue for most users (deletion was rejected), it's hard to say when this became a bug. e) Instead of try and error on the "rados_*"-prefixed object required, maybe it makes sense to have someone from Ceph look into this to define which caps are actually required to allow for list_children on RBD images with children in other pools? Regards Christian [1] https://github.com/ceph/ceph-ansible/blob/b6102975549d8f870b0c20a01edda59d6c... [2] https://docs.ceph.com/en/latest/dev/rbd-layering/#parent-child-relationships [3] https://github.com/ceph/ceph/blame/main/src/librbd/librbd.cc#L2177 [4] https://github.com/ceph/ceph/commit/3d5f055a0796c4e059c22b46f6f1b840bb9d10ef

Christian Rohmann

9:11 p.m.

On 14.12.23 13:16, Erno Kuvaja wrote:

...

...
On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

Indeed this is a regression and it was a wild ride following the various strings along ... a) Commit https://github.com/openstack/glance_store/commit/3d221ec529862d43ab303644e74... introduced the method "_snapshot_has_external_reference" to the yoga release to fix [1]. The commit message also briefly states:

...

NOTE: To check this dependency glance osd needs 'read' access to cinder and nova side RBD pool.

but there is zero mention of this requirement in the release notes for Yoga [2]. b) The mentioned method was removed again with [4] and this change was backported to the 2023.1 release. There again was no mention of the change to operators via the release notes, who could now remove the read access for volumes from the Glance user again. c) For none of the changes a and b there was any update to the actual documentation on how to configure the glance user ceph caps. d) Adding to c, devstack very much is out of sync to what would currently be considered "correct" in regards to caps [7]. Too liberal caps / ACLs are not helpful when testing for regressions. e) The "_snapshot_has_external_reference" method is currently just dangling and unused [5]. f) @Jonathan Overriding some managed code should really just be a temporary fix (it was for Stein if I read this correctly). Could those openstack_keys in [6], once we figured out what the caps really should be, be converted into a PR against upstream of ceph-ansible [8] to fix things at the root? g) I am still wondering what the caps to allow reading "rbd_children" prefixed rados objects is or was used for? Especially with the managed profiles such as "rbd" or "rbd-readonly", things should be pretty well covered. My proposal still is .. to * determine the correct caps (least privileges, caps via profiles where possible, ...) * fix the documentation and code devstack as "upstreams" first * write an upgrade bullet point to release notes for Caracal for operators to check and align their caps from what they might have become over the various releases * distribute this as a reference to the deployment tools and also the Ceph docs Regards Christian [1] https://bugs.launchpad.net/glance-store/+bug/1954883 [2] https://docs.openstack.org/releasenotes/glance/yoga.html# [3] https://review.opendev.org/q/topic:%22bug/1954883%22 [4] https://review.opendev.org/q/I34dcd90a09d43127ff2e8b477750c70f3cc01113 [5] https://opendev.org/openstack/glance_store/src/commit/054bd5ddf5d4d255076bd5... [6] https://opendev.org/openstack/openstack-ansible/commit/0f92985608c0f6ff941ea... [7] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [8] https://github.com/ceph/ceph-ansible/blob/b6102975549d8f870b0c20a01edda59d6c...

Christian Rohmann

16 Jan 16 Jan

7:47 p.m.

Hey all, the issue of non-consistently documented caps was discussed at last weeks Cinder weekly, [1]. My proposal still stands. Let's ....

...

* determine the correct caps (least privileges, caps via profiles where possible, ...)

* fix the documentation and code devstack as "upstreams" first * write an upgrade bullet point to release notes for Caracal for operators to check and align their caps from what they might have become over the various releases * distribute this as a reference to the deployment tools and also the Ceph docs

@Brian @Rajat if you could kindly read through my bullet points (a through g in last post) again and help me to determine the correct caps for the cinder and glance users? Should we maybe raise a launchpad bug to track this issue? Regarding points a and b: Yes, there was a note in the RL, but for glance_store [2]! I'd like to add a note to glance itself to make operators aware that they might want to check / update the caps. Regards Christian

Rajat Dhasmana

30 Jan 30 Jan

9:40 p.m.

...

On 14.12.23 13:16, Erno Kuvaja wrote:

On Thu, 14 Dec 2023 at 10:28, Eugen Block <eblock@nde.ag> wrote:

...
Interesting, I have a kolla-ansible one-node cluster with Antelope and there I see what you describe as well. So the behavior did indeed change. I guess the docs should be updated and contain read-only rbd profile for glance.

This sounds like regression to me.

Indeed this is a regression and it was a wild ride following the various strings along ...

a) Commit https://github.com/openstack/glance_store/commit/3d221ec529862d43ab303644e74... introduced the method "_snapshot_has_external_reference" to the yoga release to fix [1]. The commit message also briefly states:

NOTE: To check this dependency glance osd needs 'read' access to cinder and nova side RBD pool.

but there is zero mention of this requirement in the release notes for Yoga [2].

The mention is in "glance store" release notes[2] and not glance since the rbd store in glance exists in the glance store project. After this change, we do require the read access to the cinder "volumes"

...

b) The mentioned method was removed again with [4] and this change was backported to the 2023.1 release. There again was no mention of the change to operators via the release notes, who could now remove the read access for volumes from the Glance user again.

The patch mentioned in (a) was a "workaround" to reject requests when we

Hi Christian, Thanks for reporting the bug[1] and collecting all the relevant information related to this. Following is my reply to your queries. On Wed, Dec 20, 2023 at 11:11 PM Christian Rohmann < christian.rohmann@inovex.de> wrote: pool. try to delete an image which has dependency on a volume since that might corrupt the volume? Not sure about the behavior but it made sense to check the dependencies and reject the request. Later a "fix" was introduced to move the image into trash where it will get deleted eventually when the dependencies get deleted but we will succeed with the image delete operation which is the goal here.

...

c) For none of the changes a and b there was any update to the actual documentation on how to configure the glance user ceph caps.

You are correct, we should be making appropriate changes to deployment documents/tools to reflect what is currently expected out of deployers. However, these changes were made in the glance project so I will leave it upto the glance team to comment on it.

...

d) Adding to c, devstack very much is out of sync to what would currently be considered "correct" in regards to caps [7]. Too liberal caps / ACLs are not helpful when testing for regressions.

Correct again, devstack is giving out permissions too leniently which might not be desirable for an actual deployment. However, devstack setups are used for development and not production environments so I wouldn't be too inclined on devstack making any changes.

...

e) The "_snapshot_has_external_reference" method is currently just dangling and unused [5].

Yes, I think we forgot to remove it in the patch that removes the "workaround" code and introduces the "fix" code. Looks like we can go ahead and remove that method.

...

f) @Jonathan Overriding some managed code should really just be a temporary fix (it was for Stein if I read this correctly). Could those openstack_keys in [6], once we figured out what the caps really should be, be converted into a PR against upstream of ceph-ansible [8] to fix things at the root?

g) I am still wondering what the caps to allow reading "rbd_children" prefixed rados objects is or was used for? Especially with the managed profiles such as "rbd" or "rbd-readonly", things should be pretty well covered.

From a cinder standpoint, I think the following permissions apply for OSD: (I'm not familiar with permissions required for monitor and manger) cinder user -> for OSD: rwx in "volumes" pool, r in "images" pool, (I don't think we need any permissions in the "vms" pool but somehow the deployment tools configure it that way, cinder/nova folks can correct me here) cinder-backup user: for OSD: rwx in "backups" pool, r in "volumes" pool The reason requiring access to other pools is: 1. cinder user requires read access in the "images" pool since we perform COW cloning when we create a bootable volume from image 2. cinder-backup user requires read access in the "volumes" pool since creating a backup of a volume requires reading the volume from the "volumes" pool If there are other permissions required or other cases where we need access to multiple pools, I'm happy to be corrected here. [1] https://bugs.launchpad.net/nova/+bug/2051244 [2] https://docs.openstack.org/releasenotes/glance_store/yoga.html#upgrade-notes Thanks Rajat Dhasmana

...

My proposal still is .. to * determine the correct caps (least privileges, caps via profiles where possible, ...)

* fix the documentation and code devstack as "upstreams" first * write an upgrade bullet point to release notes for Caracal for operators to check and align their caps from what they might have become over the various releases * distribute this as a reference to the deployment tools and also the Ceph docs

Regards

Christian

[1] https://bugs.launchpad.net/glance-store/+bug/1954883 [2] https://docs.openstack.org/releasenotes/glance/yoga.html# [3] https://review.opendev.org/q/topic:%22bug/1954883%22 [4] https://review.opendev.org/q/I34dcd90a09d43127ff2e8b477750c70f3cc01113 [5] https://opendev.org/openstack/glance_store/src/commit/054bd5ddf5d4d255076bd5... [6] https://opendev.org/openstack/openstack-ansible/commit/0f92985608c0f6ff941ea... [7] https://opendev.org/openstack/devstack-plugin-ceph/src/commit/4c22c3d0905589... [8] https://github.com/ceph/ceph-ansible/blob/b6102975549d8f870b0c20a01edda59d6c...

Christian Rohmann

31 Jan 31 Jan

4:28 p.m.

...

Later a "fix"**was introduced to move the image into trash where it will get deleted eventually when the dependencies get deleted but we will succeed with the image delete operation which is the goal here. Yes that fix is the trash feature of [cr1]. In the commit message of [cr1] it's written "This trash must be purged by a scheduled rbd trash purge operation outside of Glance.", but the release notes

Rajat, thanks for diving into my wall of text. Honestly I thought we agreed during cinder weekly that the bug [1] was the best place to work on correcting any caps for the (potentially) affected projects. But be it as it may, discussions on the different aspects I brought up might more likely happen here in the ML anyways. On 30.01.24 19:10, Rajat Dhasmana wrote: then say this happens automatically. Which is correct? [cr1] https://review.opendev.org/c/openstack/glance_store/+/884524

...

...
e) The "_snapshot_has_external_reference" method is currently just dangling and unused [5].

Yes, I think we forgot to remove it in the patch that removes the "workaround" code and introduces the "fix" code. Looks like we can go ahead and remove that method. I pushed a change to: https://review.opendev.org/c/openstack/glance_store/+/907317

...

d) Adding to c, devstack very much is out of sync to what would currently be considered "correct" in regards to caps [7]. Too liberal caps / ACLs are not helpful when testing for regressions.

Correct again, devstack is giving out permissions too leniently which might not be desirable for an actual deployment. However, devstack setups are used for development and not production environments so I wouldn't be too inclined on devstack making any changes. I beg to disagree. Since devstack is used as target for all sorts of (integration) tests, the alignment of the access permissions makes sense. It's kinda like running everything as "root" and saying "it's only for testing"... how would you then notice any permission related issues? Using the same permission is devstack as (to be) noted in the documentation is crucial to them being "somewhat" correct.

...

From a cinder standpoint, I think the following permissions apply for OSD: (I'm not familiar with permissions required for monitor and manger)

cinder user -> for OSD: rwx in "volumes" pool, r in "images" pool, (I don't think we need any permissions in the "vms" pool but somehow the deployment tools configure it that way, cinder/nova folks can correct me here) cinder-backup user: for OSD: rwx in "backups" pool, r in "volumes" pool

This is one of my major drivers for starting all this fuzz in the first place .... Apart from the access levels "read" or "read-write" on the different pools, let me note again, that these plain caps are NOT recommended (anymore). Not using the managed "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", does have side effects, see [cr3]. This makes a big difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see [cr4]. We already ran into this with images having stale locks and that is also why deployment tools do use profiles. [cr3] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [cr4] https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.

...

The reason requiring access to other pools is: 1. cinder user requires read access in the "images" pool since we perform COW cloning when we create a bootable volume from image 2. cinder-backup user requires read access in the "volumes" pool since creating a backup of a volume requires reading the volume from the "volumes" pool

If there are other permissions required or other cases where we need access to multiple pools, I'm happy to be corrected here.

[1] https://bugs.launchpad.net/nova/+bug/2051244 [2] https://docs.openstack.org/releasenotes/glance_store/yoga.html#upgrade-notes

There is https://review.opendev.org/c/openstack/cinder/+/809523 which tries to improve things. This might also have an impact on required caps? Regards Christian

Eugen Block

4:45 p.m.

@Rajat: I just like to add why the cinder client could/should have access to the "vms" pool:

...

cinder user -> for OSD: rwx in "volumes" pool, r in "images" pool, (I don't think we need any permissions in the "vms" pool but somehow the deployment tools configure it that way, cinder/nova folks can correct me here)

When building VMs with ephemeral disks - which can reside in a different rbd pool - a nova client would require access to both "volumes" and "vms" pools. But since a cinder client is required anyway, nova often simply makes use of the existing cinder client. Zitat von Christian Rohmann <christian.rohmann@inovex.de>:

...

Rajat, thanks for diving into my wall of text.

Honestly I thought we agreed during cinder weekly that the bug [1] was the best place to work on correcting any caps for the (potentially) affected projects. But be it as it may, discussions on the different aspects I brought up might more likely happen here in the ML anyways.

On 30.01.24 19:10, Rajat Dhasmana wrote:

...
Later a "fix"**was introduced to move the image into trash where it will get deleted eventually when the dependencies get deleted but we will succeed with the image delete operation which is the goal here. Yes that fix is the trash feature of [cr1]. In the commit message of [cr1] it's written "This trash must be purged by a scheduled rbd trash purge operation outside of Glance.", but the release notes then say this happens automatically. Which is correct?

[cr1] https://review.opendev.org/c/openstack/glance_store/+/884524

...
...
e) The "_snapshot_has_external_reference" method is currently just dangling and unused [5].

Yes, I think we forgot to remove it in the patch that removes the "workaround" code and introduces the "fix" code. Looks like we can go ahead and remove that method. I pushed a change to: https://review.opendev.org/c/openstack/glance_store/+/907317

...
d) Adding to c, devstack very much is out of sync to what would currently be considered "correct" in regards to caps [7]. Too liberal caps / ACLs are not helpful when testing for regressions.

Correct again, devstack is giving out permissions too leniently which might not be desirable for an actual deployment. However, devstack setups are used for development and not production environments so I wouldn't be too inclined on devstack making any changes. I beg to disagree. Since devstack is used as target for all sorts of (integration) tests, the alignment of the access permissions makes sense. It's kinda like running everything as "root" and saying "it's only for testing"... how would you then notice any permission related issues? Using the same permission is devstack as (to be) noted in the documentation is crucial to them being "somewhat" correct.

...
From a cinder standpoint, I think the following permissions apply for OSD: (I'm not familiar with permissions required for monitor and manger)

cinder user -> for OSD: rwx in "volumes" pool, r in "images" pool, (I don't think we need any permissions in the "vms" pool but somehow the deployment tools configure it that way, cinder/nova folks can correct me here) cinder-backup user: for OSD: rwx in "backups" pool, r in "volumes" pool

This is one of my major drivers for starting all this fuzz in the first place ....

Apart from the access levels "read" or "read-write" on the different pools, let me note again, that these plain caps are NOT recommended (anymore). Not using the managed "profiles" such as "rbd" or "rbd-readonly" instead of raw ACLs such das "rwx", does have side effects, see [cr3]. This makes a big difference as "such privileges include the ability to blocklist other client users.", required for lock of stale RBD clients to be removed from images, see [cr4]. We already ran into this with images having stale locks and that is also why deployment tools do use profiles.

[cr3] https://docs.ceph.com/en/latest/rados/operations/user-management/#authorizat... [cr4] https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks.

...
The reason requiring access to other pools is: 1. cinder user requires read access in the "images" pool since we perform COW cloning when we create a bootable volume from image 2. cinder-backup user requires read access in the "volumes" pool since creating a backup of a volume requires reading the volume from the "volumes" pool

If there are other permissions required or other cases where we need access to multiple pools, I'm happy to be corrected here.

[1] https://bugs.launchpad.net/nova/+bug/2051244 [2] https://docs.openstack.org/releasenotes/glance_store/yoga.html#upgrade-notes

There is https://review.opendev.org/c/openstack/cinder/+/809523 which tries to improve things. This might also have an impact on required caps?

Regards

Christian

557

Age (days ago)

606

Last active (days ago)

List overview

Download

15 comments

6 participants

participants (6)

Christian Rohmann
Erno Kuvaja
Eugen Block
Jonathan Rosser
KEREM CELIKER
Rajat Dhasmana

What are the correct auth caps for Ceph RBD clients of Cinder / Glance / Nova

tags

participants (6)