I wanted to float something that we talked about in the public cloud SIG meeting today [1] which is the concept of making the lock API more granular to lock on a list of actions rather than globally locking all actions that can be performed on a server. The primary use case we discussed was around a pre-paid pricing model for servers. A user can pre-pay resources at a discount if let's say they are going to use them for a month at a fixed rate. However, once they do, they can't resize those servers without going through some kind of approval (billing) process to resize up. With this, the provider could lock the user from performing the resize action on the server but the user could do other things like stop/start/reboot/snapshot/etc. The pricing model sounds similar to pre-emptible instances for getting a discount but the scenario is different in that these servers couldn't be pre-empted (they are definitely more non-cloudy pets than cattle). An alternative solution for that locked resize issue is using granular policy rules such that pre-paid servers have some other kind of role attached to them so by policy you could restrict users from performing actions on those servers (but the admin could override). In reality I'm not sure how feasible that is in a public cloud with several thousand projects. The issue I see with policy controlling this is the role is attached to the project, not the resource (the server), so if you did this would users have to have separate projects for on-demand vs pre-paid resources? I believe that's what CERN and StackHPC are doing with pre-emptible instances (you have different projects with different quota models for pre-emptible resources). I believe there are probably other use cases for granular locks on servers for things like service VMs (trove creates some service VMs to run a database cluster and puts locks on those servers). Again, definitely a pet scenario but it's one I've heard before. Would people be generally in favor of this or opposed, or just meh? [1] https://etherpad.openstack.org/p/publiccloud-wg -- Thanks, Matt
On Thu, Dec 20, 2018 at 1:09 PM Matt Riedemann <mriedemos@gmail.com> wrote:
I wanted to float something that we talked about in the public cloud SIG meeting today [1] which is the concept of making the lock API more granular to lock on a list of actions rather than globally locking all actions that can be performed on a server.
The primary use case we discussed was around a pre-paid pricing model for servers. A user can pre-pay resources at a discount if let's say they are going to use them for a month at a fixed rate. However, once they do, they can't resize those servers without going through some kind of approval (billing) process to resize up. With this, the provider could lock the user from performing the resize action on the server but the user could do other things like stop/start/reboot/snapshot/etc.
The pricing model sounds similar to pre-emptible instances for getting a discount but the scenario is different in that these servers couldn't be pre-empted (they are definitely more non-cloudy pets than cattle).
An alternative solution for that locked resize issue is using granular policy rules such that pre-paid servers have some other kind of role attached to them so by policy you could restrict users from performing actions on those servers (but the admin could override). In reality I'm not sure how feasible that is in a public cloud with several thousand projects. The issue I see with policy controlling this is the role is attached to the project, not the resource (the server), so if you did this would users have to have separate projects for on-demand vs pre-paid resources? I believe that's what CERN and StackHPC are doing with pre-emptible instances (you have different projects with different quota models for pre-emptible resources).
One way you might be able to do this is by shoveling off the policy check using oslo.policy's http_check functionality [0]. But, it still doesn't fix the problem that users have roles on projects, and that's the standard for relaying information from keystone to services today. Hypothetically, the external policy system *could* be an API that allows operators to associate users to different policies that are more granular than what OpenStack offers today (I could POST to this policy system that a specific user can do everything but resize up this *specific* instance). When nova parses a policy check, it hands control to oslo.policy, which shuffles it off to this external system for enforcement. This external policy system evaluates the policies based on what information nova passes it, which would require the policy check string, context of the request like the user, and the resource they are trying operate on (the instance in this case). The external policy system could query it's own policy database for any policies matching that data, run the decisions, and return the enforcement decision per the oslo.limit API. Conversely, you'll have a performance hit since the policy decision and policy enforcement points are no longer oslo.policy *within* nova, but some external system being called by oslo.policy... Might not be the best idea, but food for thought based on the architecture we have today. [0] https://docs.openstack.org/oslo.policy/latest/user/plugins.html
I believe there are probably other use cases for granular locks on servers for things like service VMs (trove creates some service VMs to run a database cluster and puts locks on those servers). Again, definitely a pet scenario but it's one I've heard before.
Would people be generally in favor of this or opposed, or just meh?
[1] https://etherpad.openstack.org/p/publiccloud-wg
--
Thanks,
Matt
On 12/20/2018 1:45 PM, Lance Bragstad wrote:
One way you might be able to do this is by shoveling off the policy check using oslo.policy's http_check functionality [0]. But, it still doesn't fix the problem that users have roles on projects, and that's the standard for relaying information from keystone to services today.
Hypothetically, the external policy system *could* be an API that allows operators to associate users to different policies that are more granular than what OpenStack offers today (I could POST to this policy system that a specific user can do everything but resize up this *specific* instance). When nova parses a policy check, it hands control to oslo.policy, which shuffles it off to this external system for enforcement. This external policy system evaluates the policies based on what information nova passes it, which would require the policy check string, context of the request like the user, and the resource they are trying operate on (the instance in this case). The external policy system could query it's own policy database for any policies matching that data, run the decisions, and return the enforcement decision per the oslo.limit API.
One thing I'm pretty sure of in nova is we do not do a great job of getting the target of the policy check before actually doing the check. In other words, our target is almost always the project/user from the request context, and not the actual resource upon which the action is being performed (the server in most cases). I know John Garbutt had a spec for this before. It always confused me.
Conversely, you'll have a performance hit since the policy decision and policy enforcement points are no longer oslo.policy *within* nova, but some external system being called by oslo.policy...
Yeah. The other thing is if I'm just looking at my server, I can see if it's locked or not since it's an attribute of the server resource. With policy I would only know if I can perform a certain action if I get a 403 or not, which is fine in most cases. Being able to see via some list of locked actions per server is arguably more user friendly. This also reminds me of reporting / capabilities APIs we've talked about over the years, e.g. what I can do on this cloud, on this host, or with this specific server?
Might not be the best idea, but food for thought based on the architecture we have today.
Definitely, thanks for the alternative. This is something one could implement per-provider based on need if we don't have a standard solution. -- Thanks, Matt
On Thu, Dec 20, 2018 at 3:50 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 12/20/2018 1:45 PM, Lance Bragstad wrote:
One way you might be able to do this is by shoveling off the policy check using oslo.policy's http_check functionality [0]. But, it still doesn't fix the problem that users have roles on projects, and that's the standard for relaying information from keystone to services today.
Hypothetically, the external policy system *could* be an API that allows operators to associate users to different policies that are more granular than what OpenStack offers today (I could POST to this policy system that a specific user can do everything but resize up this *specific* instance). When nova parses a policy check, it hands control to oslo.policy, which shuffles it off to this external system for enforcement. This external policy system evaluates the policies based on what information nova passes it, which would require the policy check string, context of the request like the user, and the resource they are trying operate on (the instance in this case). The external policy system could query it's own policy database for any policies matching that data, run the decisions, and return the enforcement decision per the oslo.limit API.
One thing I'm pretty sure of in nova is we do not do a great job of getting the target of the policy check before actually doing the check. In other words, our target is almost always the project/user from the request context, and not the actual resource upon which the action is being performed (the server in most cases). I know John Garbutt had a spec for this before. It always confused me.
I doubt nova is alone in this position. I would bet there are a lot of cases across OpenStack where we could be more consistent in how this information is handed to oslo.policy. We attempted to solve this for the other half of the equation, which is the `creds` dictionary. Turns out a lot of what was in this arbitrary `creds` dict, was actually just information from the request context object. The oslo.policy library now supports context objects directly [0], as opposed to hoping services build the dictionary properly. Target information will be a bit harder to do because it's different across services and even APIs within the same service. But yeah, I totally sympathize with the complexity it puts on developers. [0] https://review.openstack.org/#/c/578995/
Conversely, you'll have a performance hit since the policy decision and policy enforcement points are no longer oslo.policy *within* nova, but some external system being called by oslo.policy...
Yeah. The other thing is if I'm just looking at my server, I can see if it's locked or not since it's an attribute of the server resource. With policy I would only know if I can perform a certain action if I get a 403 or not, which is fine in most cases. Being able to see via some list of locked actions per server is arguably more user friendly. This also reminds me of reporting / capabilities APIs we've talked about over the years, e.g. what I can do on this cloud, on this host, or with this specific server?
Yeah - I wouldn't mind picking that conversation up, maybe in a separate thread. An idea we had with keystone was to run a user's request through all registered policies and return a list of the ones they could access (e.g., take my token and tell me what I can do with it.) There are probably other issues with this, since policy names are mostly operator facing and end users don't really care at the moment.
Might not be the best idea, but food for thought based on the architecture we have today.
Definitely, thanks for the alternative. This is something one could implement per-provider based on need if we don't have a standard solution.
Right, I always thought it would be a good fit for people providing super-specific policy checks or have a custom syntax they want to implement. It keeps most of that separate from the services and oslo.policy. So long as we pass target and context information consistently, they essentially have an API they can write policies against.
--
Thanks,
Matt
On 12/20/18 4:58 PM, Lance Bragstad wrote:
On Thu, Dec 20, 2018 at 3:50 PM Matt Riedemann <mriedemos@gmail.com <mailto:mriedemos@gmail.com>> wrote:
On 12/20/2018 1:45 PM, Lance Bragstad wrote: > > One way you might be able to do this is by shoveling off the policy > check using oslo.policy's http_check functionality [0]. But, it still > doesn't fix the problem that users have roles on projects, and that's > the standard for relaying information from keystone to services today. > > Hypothetically, the external policy system *could* be an API that allows > operators to associate users to different policies that are more > granular than what OpenStack offers today (I could POST to this policy > system that a specific user can do everything but resize up this > *specific* instance). When nova parses a policy check, it hands control > to oslo.policy, which shuffles it off to this external system for > enforcement. This external policy system evaluates the policies based on > what information nova passes it, which would require the policy check > string, context of the request like the user, and the resource they are > trying operate on (the instance in this case). The external policy > system could query it's own policy database for any policies matching > that data, run the decisions, and return the enforcement decision per > the oslo.limit API.
One thing I'm pretty sure of in nova is we do not do a great job of getting the target of the policy check before actually doing the check. In other words, our target is almost always the project/user from the request context, and not the actual resource upon which the action is being performed (the server in most cases). I know John Garbutt had a spec for this before. It always confused me.
I doubt nova is alone in this position. I would bet there are a lot of cases across OpenStack where we could be more consistent in how this information is handed to oslo.policy. We attempted to solve this for the other half of the equation, which is the `creds` dictionary. Turns out a lot of what was in this arbitrary `creds` dict, was actually just information from the request context object. The oslo.policy library now supports context objects directly [0], as opposed to hoping services build the dictionary properly. Target information will be a bit harder to do because it's different across services and even APIs within the same service. But yeah, I totally sympathize with the complexity it puts on developers.
[0] https://review.openstack.org/#/c/578995/
> > Conversely, you'll have a performance hit since the policy decision and > policy enforcement points are no longer oslo.policy *within* nova, but > some external system being called by oslo.policy...
Yeah. The other thing is if I'm just looking at my server, I can see if it's locked or not since it's an attribute of the server resource. With policy I would only know if I can perform a certain action if I get a 403 or not, which is fine in most cases. Being able to see via some list of locked actions per server is arguably more user friendly. This also reminds me of reporting / capabilities APIs we've talked about over the years, e.g. what I can do on this cloud, on this host, or with this specific server?
Yeah - I wouldn't mind picking that conversation up, maybe in a separate thread. An idea we had with keystone was to run a user's request through all registered policies and return a list of the ones they could access (e.g., take my token and tell me what I can do with it.) There are probably other issues with this, since policy names are mostly operator facing and end users don't really care at the moment.
> > Might not be the best idea, but food for thought based on the > architecture we have today.
Definitely, thanks for the alternative. This is something one could implement per-provider based on need if we don't have a standard solution.
Right, I always thought it would be a good fit for people providing super-specific policy checks or have a custom syntax they want to implement. It keeps most of that separate from the services and oslo.policy. So long as we pass target and context information consistently, they essentially have an API they can write policies against.
I know we fixed a number of bugs in services around the time of the first Denver PTG because a user wanted to offload policy checks to an external system and used HTTPCheck for it. They ran across a number of places where the data passed to oslo.policy was either missing or incorrect, which meant their policy system didn't have enough to make a decision. I haven't heard anything new about this in a while, so it's either still working for them or they gave up on the idea. There's also a spec proposing that we add more formal support for external policy engines to oslo.policy: https://review.openstack.org/#/c/578719/ It probably doesn't solve this problem any more than the HTTPCheck option does, but if one were to go down that path it would make external policy engines easier to use (no need to write a custom policy file to replace every rule with HTTPCheck, for example).
--
Thanks,
Matt
On 12/20/2018 1:07 PM, Matt Riedemann wrote:
I wanted to float something that we talked about in the public cloud SIG meeting today [1] which is the concept of making the lock API more granular to lock on a list of actions rather than globally locking all actions that can be performed on a server.
The primary use case we discussed was around a pre-paid pricing model for servers. A user can pre-pay resources at a discount if let's say they are going to use them for a month at a fixed rate. However, once they do, they can't resize those servers without going through some kind of approval (billing) process to resize up. With this, the provider could lock the user from performing the resize action on the server but the user could do other things like stop/start/reboot/snapshot/etc.
On the operator side, it seems like you could just auto-switch the user from fixed-rate to variable-rate for that instance (assuming you have their billing info). It almost sounds like this is just a convenience thing for the user, so they don't accidentally resize the instance. Looking at it more generally, are there any other user-callable Compute API calls that would make sense to selectively disable for a specific resource? Chris
participants (4)
-
Ben Nemec
-
Chris Friesen
-
Lance Bragstad
-
Matt Riedemann