Nova not updating to new size of an extended in-use / attached cinder volume (Ceph RBD) to guest
Hello Openstack-Discuss,
after some digging I found out that cinder learned to resize attached (in-use) volumes quite a while ago with the introduction of the initial "extend 'in-use' volume" feature: https://review.opendev.org/c/openstack/cinder/+/454287/
The support was then extended to also cover Ceph RBD backed volumes with: https://review.opendev.org/c/openstack/nova/+/613039/
Since this is only about the cinder part, I was wondering if nova would ever find out and would actively rescan the device / volume and quintessentially present an increased new size to the guest. Apparently this is where a certain volume-extended event comes into play: https://review.opendev.org/c/openstack/nova/+/454322/ that is to be emitted by cinder when a volume had been extened.
I then went head and tried this with the openstack cli (as Horizon does not seem to support / offer to resize in-use volumes. I run Openstack Train with Ceph RBD as storage. First I ran into an issue with the openstack cli (https://bugs.launchpad.net/cinder/+bug/1871759/comments/2 https://bugs.launchpad.net/cinder/+bug/1871759/comments/2) but using
cinder extend $volumeid $newsize
I was able to resize in-use volumes just fine.
The only thing missing was the propagation to the guest. I played around with SCSI rescans, but that did not work, reboots also failed to do anyting in this regards. Stopping and starting the VM did work, but why having the capability to online-resize an attached volume when not being able to resize the guest volume as well?
Apparently there seems to be an old and somewhat similar observation / bug with https://bugs.launchpad.net/nova/+bug/1369465
So I was simply wondering if this is expected to be working? Are there any special settings / options I need to set to enable this feature?
Thanks and with kind regards
Christian
On 15-02-21 00:52:52, Christian Rohmann wrote:
Hello Openstack-Discuss,
after some digging I found out that cinder learned to resize attached (in-use) volumes quite a while ago with the introduction of the initial "extend 'in-use' volume" feature: https://review.opendev.org/c/openstack/cinder/+/454287/
The support was then extended to also cover Ceph RBD backed volumes with: https://review.opendev.org/c/openstack/nova/+/613039/
Since this is only about the cinder part, I was wondering if nova would ever find out and would actively rescan the device / volume and quintessentially present an increased new size to the guest. Apparently this is where a certain volume-extended event comes into play: https://review.opendev.org/c/openstack/nova/+/454322/ that is to be emitted by cinder when a volume had been extened.
I then went head and tried this with the openstack cli (as Horizon does not seem to support / offer to resize in-use volumes. I run Openstack Train with Ceph RBD as storage. First I ran into an issue with the openstack cli (https://bugs.launchpad.net/cinder/+bug/1871759/comments/2 https://bugs.launchpad.net/cinder/+bug/1871759/comments/2) but using
cinder extend $volumeid $newsize
I was able to resize in-use volumes just fine.
The only thing missing was the propagation to the guest. I played around with SCSI rescans, but that did not work, reboots also failed to do anyting in this regards. Stopping and starting the VM did work, but why having the capability to online-resize an attached volume when not being able to resize the guest volume as well?
Apparently there seems to be an old and somewhat similar observation / bug with https://bugs.launchpad.net/nova/+bug/1369465
That's an unrelated bug about ephemeral disk resize.
So I was simply wondering if this is expected to be working? Are there any special settings / options I need to set to enable this feature?
Yes this should work without any additonal changes, can you write up a nova bug with the following output in addition to the bug template:
- Your versions of libvirt and QEMU.
- Output of the following command *after* requesting a resize:
$ virsh domblkinfo $instance_uuid $target_dev
- Output of the following commands once confirmed the resize didn't happen within the domain:
$ virsh blockresize $instance_uuid $rbd_path $new_size $ virsh domblkinfo $instance_uuid $target_dev
From what you've said above this smells like a libvirt/QEMU bug but I
don't have a rbd env to hand to confirm things at the moment.
Cheers,
Hello Lee,
thanks for quick response and sorry about the late reaction from my side.
On 15/02/2021 12:18, Lee Yarwood wrote:
On 15-02-21 00:52:52, Christian Rohmann wrote:
So I was simply wondering if this is expected to be working? Are there any special settings / options I need to set to enable this feature?
Yes this should work without any additonal changes, can you write up a nova bug with the following output in addition to the bug template:
Your versions of libvirt and QEMU.
Output of the following command *after* requesting a resize:
$ virsh domblkinfo $instance_uuid $target_dev
- Output of the following commands once confirmed the resize didn't happen within the domain:
$ virsh blockresize $instance_uuid $rbd_path $new_size $ virsh domblkinfo $instance_uuid $target_dev
From what you've said above this smells like a libvirt/QEMU bug but I don't have a rbd env to hand to confirm things at the moment.
Cheers,
I have just been trying to reproduce the issue, but in all my new attempts it just worked as expected:
[162262.926512] sd 0:0:0:1: Capacity data has changed [162262.932868] sd 0:0:0:1: [sdb] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB) [162262.933061] sdb: detected capacity change from 2147483648 to 3221225472
Sorry about the noise then.
The only "bugs" to report are the usability issues of Horizon not offering "in-use" extension of volume as far as I can see and the pending support in the openstack cli (https://bugs.launchpad.net/cinder/+bug/1871759).
Thanks again, Regards
Christian
On Tue, 2021-02-16 at 22:32 +0100, Christian Rohmann wrote:
Hello Lee,
thanks for quick response and sorry about the late reaction from my side.
On 15/02/2021 12:18, Lee Yarwood wrote:
On 15-02-21 00:52:52, Christian Rohmann wrote:
So I was simply wondering if this is expected to be working? Are there any special settings / options I need to set to enable this feature?
Yes this should work without any additonal changes, can you write up a nova bug with the following output in addition to the bug template:
Your versions of libvirt and QEMU.
Output of the following command *after* requesting a resize:
$ virsh domblkinfo $instance_uuid $target_dev
- Output of the following commands once confirmed the resize didn't
happen within the domain:
$ virsh blockresize $instance_uuid $rbd_path $new_size $ virsh domblkinfo $instance_uuid $target_dev
From what you've said above this smells like a libvirt/QEMU bug but I don't have a rbd env to hand to confirm things at the moment.
Cheers,
I have just been trying to reproduce the issue, but in all my new attempts it just worked as expected:
[162262.926512] sd 0:0:0:1: Capacity data has changed [162262.932868] sd 0:0:0:1: [sdb] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB) [162262.933061] sdb: detected capacity change from 2147483648 to 3221225472
Sorry about the noise then.
The only "bugs" to report are the usability issues of Horizon not offering "in-use" extension of volume as far as I can see and the pending support in the openstack cli (https://bugs.launchpad.net/cinder/+bug/1871759).
i have done a live extend using cinder client before.
sean@p50:~$ cinder --help extend usage: cinder extend <volume> <new_size>
Attempts to extend size of an existing volume.
Positional Arguments: <volume> Name or ID of volume to extend. <new_size> New size of volume, in GiBs.
so it does work provided you have not used the nova workaround config options for host mounting the rbd volumens. https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... that will go away shortly proably in the xena release but if you enable that you can resize volumnes that are in use.
so ya i think the only bugs are really not bugs but RFEs openstack client does not have parity with cinder client and nor does horizon. a lack of feature in the latter too is not really a bug just no one has implmented it yet so there is a gap.
from the api side and a nova persoection i think it should work you just need to use a clint that supprot it.
Thanks again, Regards
Christian
Hello,
This adds support for Horizon https://review.opendev.org/c/openstack/horizon/+/749013
Regarding the actual extending of in-use volumes, we had an issue where cinder could not talk to os-server-external-events endpoint for nova because it used the wrong endpoint when looking up in keystone. We saw the error in cinder-volume.log except for that I can't remember we did anything special.
Had to use newer microversion for cinder when using CLI.
cinder --os-volume-api-version 3.42 extend <volume ID or name> <new size in GB>
Best regards
________________________________ From: Sean Mooney smooney@redhat.com Sent: Wednesday, February 17, 2021 4:11:45 AM To: openstack-discuss@lists.openstack.org Subject: Re: Nova not updating to new size of an extended in-use / attached cinder volume (Ceph RBD) to guest
On Tue, 2021-02-16 at 22:32 +0100, Christian Rohmann wrote:
Hello Lee,
thanks for quick response and sorry about the late reaction from my side.
On 15/02/2021 12:18, Lee Yarwood wrote:
On 15-02-21 00:52:52, Christian Rohmann wrote:
So I was simply wondering if this is expected to be working? Are there any special settings / options I need to set to enable this feature?
Yes this should work without any additonal changes, can you write up a nova bug with the following output in addition to the bug template:
Your versions of libvirt and QEMU.
Output of the following command *after* requesting a resize:
$ virsh domblkinfo $instance_uuid $target_dev
- Output of the following commands once confirmed the resize didn't happen within the domain:
$ virsh blockresize $instance_uuid $rbd_path $new_size $ virsh domblkinfo $instance_uuid $target_dev
From what you've said above this smells like a libvirt/QEMU bug but I don't have a rbd env to hand to confirm things at the moment.
Cheers,
I have just been trying to reproduce the issue, but in all my new attempts it just worked as expected:
[162262.926512] sd 0:0:0:1: Capacity data has changed [162262.932868] sd 0:0:0:1: [sdb] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB) [162262.933061] sdb: detected capacity change from 2147483648 to 3221225472
Sorry about the noise then.
The only "bugs" to report are the usability issues of Horizon not offering "in-use" extension of volume as far as I can see and the pending support in the openstack cli (https://bugs.launchpad.net/cinder/+bug/1871759).
i have done a live extend using cinder client before.
sean@p50:~$ cinder --help extend usage: cinder extend <volume> <new_size>
Attempts to extend size of an existing volume.
Positional Arguments: <volume> Name or ID of volume to extend. <new_size> New size of volume, in GiBs.
so it does work provided you have not used the nova workaround config options for host mounting the rbd volumens. https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... that will go away shortly proably in the xena release but if you enable that you can resize volumnes that are in use.
so ya i think the only bugs are really not bugs but RFEs openstack client does not have parity with cinder client and nor does horizon. a lack of feature in the latter too is not really a bug just no one has implmented it yet so there is a gap.
from the api side and a nova persoection i think it should work you just need to use a clint that supprot it.
Thanks again, Regards
Christian
Hello all,
On 16/02/2021 22:32, Christian Rohmann wrote:
I have just been trying to reproduce the issue, but in all my new attempts it just worked as expected:
[162262.926512] sd 0:0:0:1: Capacity data has changed [162262.932868] sd 0:0:0:1: [sdb] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB) [162262.933061] sdb: detected capacity change from 2147483648 to 3221225472
On 17/02/2021 09:43, Tobias Urdin wrote:
Regarding the actual extending of in-use volumes, we had an issue where cinder could not talk to os-server-external-events endpoint for nova because it used the wrong endpoint when looking up in keystone. We saw the error in cinder-volume.log except for that I can't remember we did anything special.
Had to use newer microversion for cinder when using CLI. cinder --os-volume-api-version 3.42 extend <volume ID or name> <new size in GB>
I am very sorry for the delay, but I was now finally able to reproduce the issue and with a VERY strange finding:
1)
* When using the cinder client with a Volume API version >= 3.42 like you suggested Tobias, it works just fine using cloud admin credentials. * The volume is attached / in-use, but it is resized just fine, including the notification of the kernel on the VM.
2)
* When attempting the same thing using the project users credentials the resize also works just fine, volume still attached and in-use, but then the VM is NOT notified * Also this does not seems to be related to nova or QEMU, but rather it appears there is no extend_volume event triggered or at least logged:
--- cut --- Request ID -- Action -- Start Time -- User ID -- Message
req-a4065b2d-1b77-4c5a-bf53-a2967b574fa0 extend_volume May 5, 2021, 1:22 p.m. 784bd2a5b82c3b31eb56ee - req-5965910b-874f-4c7a-ab61-32d1a080d1b2 attach_volume May 5, 2021, 1:09 p.m. 4b2abc14e511a7c0b10c - req-75ef5bd3-75d3-4146-84eb-1809789b6586 Create May 5, 2021, 1:09 p.m. 4b2abc14e511a7c0b10c - --- cut ---
UserID "784bd2a5b82c3b31eb56ee" is the regular user creating and attaching the volume. But that user also did an extend_volume, which is not logged as an event. There also was no API errors reported back to the client, the resize did happen - just not propagated to the VM - so a stop and restart was required.
But the admin user with id "4b2abc14e511a7c0b10c" doing a resize attempt caused an extend_volume and consequently did trigger a notification of the VM, just as expected and documented in regards to this feature.
Does anybody have any idea what could cause this or where to look for more details? Regards
Christian
On 05/05/2021 17:34, Christian Rohmann wrote:
But the admin user with id "4b2abc14e511a7c0b10c" doing a resize attempt caused an extend_volume and consequently did trigger a notification of the VM, just as expected and documented in regards to this feature.
Does anybody have any idea what could cause this or where to look for more details?
Apparently this is a (long) known issue (i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1640443) which is caused by
Cinder talking to Nova to have it create the volume-extended event but does so with user credentials and this is denied by the default policy:
--- cut --- 2021-05-06 15:13:12.214 4197 DEBUG nova.api.openstack.wsgi [req-4c291455-a21a-4314-8e57-173e66e6e60a f9c0b52ec43e423e9b5ea63d620f4e27 92a6c19e7482400385806266cdef149c - default default] Returning 403 to user: Po licy doesn't allow os_compute_api:os-server-external-events:create to be performed. __call__ /usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py:941 --- cut ---
Unfortunately cinder does not report or log anything.
Is switching cinder to the "admin" interface the proper approach here or am I missing something else?
Regards
Christian
On Thu, 2021-05-06 at 18:08 +0200, Christian Rohmann wrote:
On 05/05/2021 17:34, Christian Rohmann wrote:
But the admin user with id "4b2abc14e511a7c0b10c" doing a resize attempt caused an extend_volume and consequently did trigger a notification of the VM, just as expected and documented in regards to this feature.
Does anybody have any idea what could cause this or where to look for more details?
Apparently this is a (long) known issue (i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1640443) which is caused by
Cinder talking to Nova to have it create the volume-extended event but does so with user credentials and this is denied by the default policy:
--- cut --- 2021-05-06 15:13:12.214 4197 DEBUG nova.api.openstack.wsgi [req-4c291455-a21a-4314-8e57-173e66e6e60a f9c0b52ec43e423e9b5ea63d620f4e27 92a6c19e7482400385806266cdef149c - default default] Returning 403 to user: Po licy doesn't allow os_compute_api:os-server-external-events:create to be performed. __call__ /usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py:941 --- cut ---
Unfortunately cinder does not report or log anything.
that woudl make sense give the externa event api is admin only and only inteed to be use by services so the fix would be for cidner to use an admin credtial not the user one to send the event to nova.
Is switching cinder to the "admin" interface the proper approach here or am I missing something else?
Regards
Christian
Hey Sean,
On 06/05/2021 18:29, Sean Mooney wrote:
that woudl make sense give the externa event api is admin only and only inteed to be use by services so the fix would be for cidner to use an admin credtial not the user one to send the event to nova.
Thanks, yes and that can just be achieved by configuring one which is then used for such calls.
But instead of a fully privileged "admin" user there rather should exist a proper RBAC role to only allow one service (cinder in this case) to do what it required to function (e.g. send events to Nova) and not just "everything for every other service". This first of all violates the least privilege principle, but in an ecosystem that made up of individual projects of varying security qualities and which are highly distributed it's just a bad idea to give every component and their dog the keys to the kindom.
There was a forum on exactly that issue at the Summit and how that is one aspect of the RBAC , see the etherpad: https://etherpad.opendev.org/p/deprivilization-of-service-accounts
Regards
Christian
On Tue, 2022-06-28 at 09:48 +0200, Christian Rohmann wrote:
Hey Sean,
On 06/05/2021 18:29, Sean Mooney wrote:
that woudl make sense give the externa event api is admin only and only inteed to be use by services so the fix would be for cidner to use an admin credtial not the user one to send the event to nova.
Thanks, yes and that can just be achieved by configuring one which is then used for such calls.
But instead of a fully privileged "admin" user there rather should exist a proper RBAC role to only allow one service (cinder in this case) to do what it required to function (e.g. send events to Nova) and not just "everything for every other service". This first of all violates the least privilege principle, but in an ecosystem that made up of individual projects of varying security qualities and which are highly distributed it's just a bad idea to give every component and their dog the keys to the kindom.
that is what the service role is inteded to be as i mentioned before. the admin user was what was agreed to use before. the service user will not have admin permisisons it will only be used for service to service comunication for api operations that need higher permision then a normal user but not full adming
the external event api is one such interface that even normal operators are not intented to ever call its purly a service to service api.
just being facnk we are entirly aware that this violates the principal of least privlaage but you are missign the context that indivigual servies are not ment to creat arbitary roles. we are ment to use standard roles and the service role which will be intoduced in phase 2 of the rbac cross project goal https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rba... is the standard role we agreed to use for that usecase.
in prior iterations of the secure rbac work prior to the reset in direction in yoga we had the concetp of system scoped tokens being scoped to a specifc system
so system=all vs system=compute vs system=networking that was dropped as far as i can tell when we reset the direction
today we can create role and assign them to group or users directly
currently we have no way to model that a user shoudl have this role only on a speciifc service endpoint. i think to adress your use case we need that abltiy to say the neutron user has the service role on the nova endpoint but not on any other endpoint.
that will enable your reduction of pivaldages but it not currently planned in our 3 phased adoption fo secure rbac at this time.
There was a forum on exactly that issue at the Summit and how that is one aspect of the RBAC , see the etherpad: https://etherpad.opendev.org/p/deprivilization-of-service-accounts
yes im aware but jsut to be frack repating this without taking on bored the feedback i had previously given that 1 this si a know gap that we are intentially not adressing in the scope fo the current project goal 2 that the service role is planned to be create to adress marking an endpoint as admin when it does not need to be is not helpful.
i was not in berlin and dont want to discurage you from expressing your opipion on the list and engagian with the comunity to improve openstack securtiy.
the way to do that productivly is to get inovled with the secure rbac work and to either work with others to develop the keystoen feature required to alow ues to assocate role grant to service endpoints or to come up with another solution that can then be added too the cross project goal.
unless we do a hard pivort away form standardised roles approch and adopt oath/api-key style permissions model used by github or games like eve onlien for decades where you cate a key tha thas sepcific permision on speicifc api endpoiing tna then use that api key in to make requests i dotn see another way to adress this cleanly via the api. https://docs.github.com/en/rest/overview/permissions-required-for-github-app...
to achive something like ^ in an openstack context we would need to develop oslo midelway or similar that coudl read the policy.yaml file used by each service and expose the roles and access requriement for all api endpoint within the serve at a well know url. e.g. nova.cloud.com/policy_rules then keystone would have to aggreate that and allow you to create an applciation credital or similar that was delagated a subset or permissions on the indivigual serce endpoint. you would then configure the service to use the applciation credideial instead. (note applications credential are the closest thign we have to an api key or githubs application based permissions model bug they are ment to have roles deleaged to them not per api end point permssiosn so its not a direct 1:1 mapping)
that would be a very differnt workflow form what we have today.
if you are willing to sign up to do some of the work then im sure you can help drive the direction of this since you seam interested.
in any case without creating per project roles today which is not somethign we wanted to do previosly we need a feature in keystone to do better then just the service role. but with the service roles no service will need admin if we implemeent thigns correctly.
Regards
Christian
participants (4)
-
Christian Rohmann
-
Lee Yarwood
-
Sean Mooney
-
Tobias Urdin