Hello everyone,
We came across this bug [1] in nova recently and wanted to know what people think is the best (relatively) way to fix this.
In the past, the project id validation was added as a best effort to prevent users from being able to enter random values into the database. When this validation is used from the os flavor set/unset admin apis [2], there are chances that keystone returns a 403 which gets silently ignored by nova [3] allowing the user to enter the provided project_id/name without validation or warning or remove an existing flavor-project mapping. There were a couple of options discussed on IRC [4] to fix this behaviour out of which the practically reasonable ones are:
1) close the bug as invalid - tweak your config (we could add docs, idk if that would be found or help) to do what you need to avoid the 403 from keystone 2) change the 403 case as an error and raise it back to the compute api caller - maybe enough time has passed to not worry about backward compat with the old non-validating behavior
Option 2 seems better than option 1 for most of us, however what we cannot agree upon is if this change should be accompanied by a microversion bump or not.
[1] https://bugs.launchpad.net/nova/+bug/1854053 [2] https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf573709696... [3] https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee51... [4] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
Cheers, Surya.
I'd recommend adding a subject title in the future.
On 11/26/2019 12:08 PM, Surya Seetharaman wrote:
- change the 403 case as an error and raise it back to the compute api
caller - maybe enough time has passed to not worry about backward compat with the old non-validating behavior
Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again).
Option 2 seems better than option 1 for most of us, however what we cannot agree upon is if this change should be accompanied by a microversion bump or not.
As noted above, I don't think option 2 precludes option 1. The compute API (admin) user will just get a 403 rather than perhaps silently wrong 200 response. If they get a 403 they likely need to fix things which is option 1.
I don't think a microversion is necessary for this if we go with option 2 since the admin user shouldn't have to opt into non-broken behavior. Yes the project_id validation stuff was added awhile ago but it was added without a microversion in its own right as a bug fix - and we used to get a *lot* of duplicate bugs about being able to use these APIs with garbage project IDs since we previously didn't validate. Here are a couple of other examples of (non-admin) APIs which changed from a bogus success response to a failure without a microversion:
* Trying to attach an SR-IOV port to an instance. * Trying to rebuild a volume-backed server with a new image.
On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann mriedemos@gmail.com wrote:
Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again).
A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me.
I don't think a microversion is necessary for this
++
----------
Cheers, Surya.
---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman surya.seetharaman9@gmail.com wrote ----
On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann mriedemos@gmail.com wrote: Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again).
A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me.
+1 on the solution. I like that code tells the error to users because people do not read the doc always.
I don't think a microversion is necessary for this ++
I disagree here. My main concern is that this is not the always-broken case. For the case where we have complete broken behaviour then we do not need microverison to fix that as mentioned by matt too.
In this case: Even 403 from keystone on GET /project, it may possible that the project exists and request Nova to add that projects in flavor access is right. This is a success case in the current situation which will be changed to 400 after the proposed solution(option2 or 2+1). This is behaviour change and should be done with microvesion.
In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. which was clearly a broken case. Any other case where projects may exist was kept as it is so microversion was not needed there. But now we are changing the success cases also to return an error and ask the user to have the GET /project permission first otherwise nova cannot process the request. Your project might be valid but nova cannot conform that till you have permission to GET /project.
-gmann
Cheers,Surya.
Ghanshyam Mann gmann@ghanshyammann.com 于2019年11月28日周四 上午2:27写道:
---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman < surya.seetharaman9@gmail.com> wrote ----
On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann mriedemos@gmail.com
wrote:
Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again).
A combo of both solutions where we raise the error to the user and
amend our docs to help them fix it seems good to me.
+1 on the solution. I like that code tells the error to users because people do not read the doc always.
I don't think a microversion is necessary for this ++
I disagree here. My main concern is that this is not the always-broken case. For the case where we have complete broken behaviour then we do not need microverison to fix that as mentioned by matt too.
In this case: Even 403 from keystone on GET /project, it may possible that the project exists and request Nova to add that projects in flavor access is right. This is a success case in the current situation which will be changed to 400 after the proposed solution(option2 or 2+1). This is behaviour change and should be done with microvesion.
That is the case we expected to be fixed by the operator, right? If the operator fix that, then there won't be any API behavior change. I guess that what Matt point.
In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. which was clearly a broken case. Any other case where projects may exist was kept as it is so microversion was not needed there. But now we are changing the success cases also to return an error and ask the user to have the GET /project permission first otherwise nova cannot process the request. Your project might be valid but nova cannot conform that till you have permission to GET /project.
-gmann
Cheers,Surya.
---- On Thu, 28 Nov 2019 03:02:19 -0600 Alex Xu soulxu@gmail.com wrote ----
Ghanshyam Mann gmann@ghanshyammann.com 于2019年11月28日周四 上午2:27写道: ---- On Wed, 27 Nov 2019 03:35:43 -0600 Surya Seetharaman surya.seetharaman9@gmail.com wrote ----
On Tue, Nov 26, 2019 at 9:26 PM Matt Riedemann mriedemos@gmail.com wrote: Note that the APIs that would change are admin-only by default. So in this case nova is configured with a service user to check if the requested project_id exists on behalf of the (admin) user making the compute API request to add/remove flavor access (or update quota values for a project). The service user does not have enough permissions in keystone to check if the project exists. Option 1 is give that service user more authority. Option 2 is basically re-raise that error to the compute (admin) user to let them know they basically need to fix their deployment (option 1 again).
A combo of both solutions where we raise the error to the user and amend our docs to help them fix it seems good to me.
+1 on the solution. I like that code tells the error to users because people do not read the doc always.
I don't think a microversion is necessary for this ++
I disagree here. My main concern is that this is not the always-broken case. For the case where we have complete broken behaviour then we do not need microverison to fix that as mentioned by matt too.
In this case: Even 403 from keystone on GET /project, it may possible that the project exists and request Nova to add that projects in flavor access is right. This is a success case in the current situation which will be changed to 400 after the proposed solution(option2 or 2+1). This is behaviour change and should be done with microvesion.
That is the case we expected to be fixed by the operator, right? If the operator fix that, then there won't be any API behavior change. I guess that what Matt point.
yeah, they can fix it via configuration change only so we can consider this as policy default change or extra policy checks changes which do not require microversion.
-gmann
In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200. which was clearly a broken case. Any other case where projects may exist was kept as it is so microversion was not needed there. But now we are changing the success cases also to return an error and ask the user to have the GET /project permission first otherwise nova cannot process the request. Your project might be valid but nova cannot conform that till you have permission to GET /project.
-gmann
Cheers,Surya.
On 11/27/2019 12:26 PM, Ghanshyam Mann wrote:
In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200.
What "old change" are you referring to here? Before [1] there was *no* validation performed when adding/removing flavor access or updating quotas for a given project. Hence all of the duplicate bugs about being able to typo/fat-finger/pass garbage values to those APIs and then be confused later when flavor access and quotas aren't working as expected. There have been a few changes to the verify method since it was added, but the point is it was a non-microversion change to admin APIs which would turn previously invalid but passing requests to failures.
I'm assuming 403 was handled "gracefully" more for backward compatibility than anything else but as can be seen from Surya's bug is just masking the original issue that this validation code was trying to fix.
Nothing is changing in request or response schema and this should only be enforced (re-raise on 403 from keystone) in the APIs that are admin-only by default so it's not an interop concern.
I just don't see why we would expect someone to need to opt into this validation actually working - and if misconfigured actually failing to indicate to the admin using the API that their deployment needs to be fixed.
[1] https://review.opendev.org/#/c/435010/
---- On Thu, 28 Nov 2019 08:34:03 -0600 Matt Riedemann mriedemos@gmail.com wrote ----
On 11/27/2019 12:26 PM, Ghanshyam Mann wrote:
In old change where we added the verify_project_id did not change the success case, that only handled the case where keystone returned 404 on GET /project means it is confirmed that the requested project does not exist so it will break later so nova started 400 instead of 200.
What "old change" are you referring to here? Before [1] there was *no* validation performed when adding/removing flavor access or updating quotas for a given project. Hence all of the duplicate bugs about being able to typo/fat-finger/pass garbage values to those APIs and then be confused later when flavor access and quotas aren't working as expected. There have been a few changes to the verify method since it was added, but the point is it was a non-microversion change to admin APIs which would turn previously invalid but passing requests to failures.
+1. agree on that change which is making an invalid passed request to failure so does not require microversion.
I'm assuming 403 was handled "gracefully" more for backward compatibility than anything else but as can be seen from Surya's bug is just masking the original issue that this validation code was trying to fix.
Nothing is changing in request or response schema and this should only be enforced (re-raise on 403 from keystone) in the APIs that are admin-only by default so it's not an interop concern.
This is a good point, On rethinking on "how the operator can fix the current success case which going to be failed case after proposed solution" is via a configuration change only (change the policy permissions). Which is similar to changing the default roles for any policy. If we change the policy defaults then we follow the deprecation phase only and do not bump the microversion. If we can follow the deprecation warning (the same as a policy change case) in this proposed solution also then we can avoid the microversion bump.
I mean, in Ussuri release, we log the warning saying that "this API might be successful for you but it will fail if the user does not have GET /project" permissions from V release. and in V release we raise the error for keystone's 403 or non-2xx cases also.
-gmann
I just don't see why we would expect someone to need to opt into this validation actually working - and if misconfigured actually failing to indicate to the admin using the API that their deployment needs to be fixed.
[1] https://review.opendev.org/#/c/435010/
--
Thanks,
Matt
On Tue, Nov 26, 2019 at 19:08, Surya Seetharaman surya.seetharaman9@gmail.com wrote:
Hello everyone,
We came across this bug [1] in nova recently and wanted to know what people think is the best (relatively) way to fix this.
In the past, the project id validation was added as a best effort to prevent users from being able to enter random values into the database. When this validation is used from the os flavor set/unset admin apis [2], there are chances that keystone returns a 403 which gets silently ignored by nova [3] allowing the user to enter the provided project_id/name without validation or warning or remove an existing flavor-project mapping. There were a couple of options discussed on IRC [4] to fix this behaviour out of which the practically reasonable ones are:
- close the bug as invalid - tweak your config (we could add docs,
idk if that would be found or help) to do what you need to avoid the 403 from keystone 2) change the 403 case as an error and raise it back to the compute api caller - maybe enough time has passed to not worry about backward compat with the old non-validating behavior
Option 2 seems better than option 1 for most of us, however what we cannot agree upon is if this change should be accompanied by a microversion bump or not.
My 2 cents: Make the problem explicit by raising the error back to the caller (which is the admin by default), enhance our docs to help the admin to fix the nova service user's permissions to avoid the 403.
gibi
[1] https://bugs.launchpad.net/nova/+bug/1854053 [2] https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf573709696... [3] https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee51... [4] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
Cheers, Surya.
Apologies, like Matt pointed out I sort of forgot to add the title in my original email.
On Tue, Nov 26, 2019 at 7:08 PM Surya Seetharaman < surya.seetharaman9@gmail.com> wrote:
Hello everyone,
We came across this bug [1] in nova recently and wanted to know what people think is the best (relatively) way to fix this.
In the past, the project id validation was added as a best effort to prevent users from being able to enter random values into the database. When this validation is used from the os flavor set/unset admin apis [2], there are chances that keystone returns a 403 which gets silently ignored by nova [3] allowing the user to enter the provided project_id/name without validation or warning or remove an existing flavor-project mapping. There were a couple of options discussed on IRC [4] to fix this behaviour out of which the practically reasonable ones are:
- close the bug as invalid - tweak your config (we could add docs, idk if
that would be found or help) to do what you need to avoid the 403 from keystone 2) change the 403 case as an error and raise it back to the compute api caller - maybe enough time has passed to not worry about backward compat with the old non-validating behavior
Option 2 seems better than option 1 for most of us, however what we cannot agree upon is if this change should be accompanied by a microversion bump or not.
[1] https://bugs.launchpad.net/nova/+bug/1854053 [2] https://github.com/openstack/nova/blob/fd67f69cfdaf04620f2e8a5f1fbf573709696... [3] https://github.com/openstack/nova/blob/d621914442855ce67ce0b99003f7e69e8ee51... [4] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
Cheers, Surya.
participants (5)
-
Alex Xu
-
Balázs Gibizer
-
Ghanshyam Mann
-
Matt Riedemann
-
Surya Seetharaman