[keystone][nova] Availability zones and unified limits
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap. Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region. What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova? There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs. Thoughts?
On 11/20/2018 07:33 AM, Lance Bragstad wrote:
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap.
Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region.
What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova?
There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs.
Keystone should have always been the thing that stores region and availability zone information. When I wrote the regions functionality for Keystone's catalog [1] I deliberately added the concept that a region can have zero or more sub-regions [2] to it. A region in Keystone wasn't (and AFAIK to this day isn't) specific to a geographic location. There's nothing preventing anyone from adding sub-regions that represent a nova availability zone [3] to Keystone. And since there's nothing preventing the existing Keystone limits API from associating a set of limits with a specific region [4] (yes, even a sub-region that represents an availability zone) I don't see any reason why the existing structures in Keystone could not be used to fulfill this functionality from the Keystone side. The problem is *always* going to be on the Nova side (and any project that made the unfortunate decision to copy nova's availability zone "implementation" [5]... hi Cinder! [6]). The way that the availability zone concept has been hacked into Nova means it's just going to be a hack on top of a hack to get per-AZ quotas working in Nova. I know this because Oath deploys this hack on top of a hack in order to divvy up resources per power domain and physical network (those two things and the site/DC essentially comprise what our availability zones are internally). Once again, not addressing the technical debt of years past -- which in this case is the lack of solid modeling of an availability zone concept in the Nova subsystems -- is hindering the forward progress of the project, which is sad. The long-term solution is to have Nova scrap it's availability zone code that relies on host aggregate metadata to work, use Keystone's /regions endpoint (and the hierarchy possible therein) as the single source of truth about availability zones, move the association of availability zone out of the Service model and onto the ComputeNode model, and have a cache of real AvailabilityZone data models stored in the Nova API top-level service. I fear that without truly tackling this underlying problem, we can make lots of progress on the Keystone side but things will slow to a crawl with people trying to figure out what the heck is going on in Nova with availability zones and how they could be tied in to quota handling. Sorry to be a pessimist^realist, -jay [1] https://github.com/openstack/keystone/commit/7c847578c8ed6a4921a24acb8a60f92... [2] https://developer.openstack.org/api-ref/identity/v3/index.html#regions [3] As I've mentioned numerous times before, there's nothing "availability" about a nova availability zone. There's no guarantees about failure domains leaking across multiple nova availability zones. Nova's availability zones are an anachronism from when a nova endpoint serviced a single isolated set of compute resources. [4] https://developer.openstack.org/api-ref/identity/v3/index.html?expanded=crea... [5] Behold, the availability zone implementation in Nova: https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29... You'll notice there's no actual data model for an AvailabilityZone anywhere in either the nova_api or nova_cell databases: https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29... https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29... Which puts "availability zone" in rare company inside Nova as one of the only data concepts that has no actual data model behind it. It shares this distinction with one other data concept... yep, you guessed it... the "region". [6] Sorry, Cinder, you inherited the lack of a real data model for availability zones from Nova: https://github.com/openstack/cinder/blob/b8167a5c3e5952cc52ff8844804b7a5ab36...
I partially agree with your statements. I'm currently rolling the ball on availability zones in our deployments and it's a real pain and I think if there was an easier concept for a source of truth for AZs (and aggregates as well) there would be less confusion and a much better forward for inter-project support for such resources. My opinion though; quota's really make sense to store in Keystone since it's the source of truth for users and projects however I would say that AZs themselves is not on such a high level and isn't the same as Keystone knowing about the regions, it's region specific. I don't think Keystone should have a view on such a detailed level inside a region, I do agree that there is a void to be filled with something that gives the source of truth on AZs, host aggregates etc though. But moving that outside of a region defeats some purpose on the whole idea to isolate it in the first place. Best regards On 11/20/2018 03:10 PM, Jay Pipes wrote:
On 11/20/2018 07:33 AM, Lance Bragstad wrote:
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap.
Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region.
What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova?
There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs. Keystone should have always been the thing that stores region and availability zone information.
When I wrote the regions functionality for Keystone's catalog [1] I deliberately added the concept that a region can have zero or more sub-regions [2] to it. A region in Keystone wasn't (and AFAIK to this day isn't) specific to a geographic location. There's nothing preventing anyone from adding sub-regions that represent a nova availability zone [3] to Keystone.
And since there's nothing preventing the existing Keystone limits API from associating a set of limits with a specific region [4] (yes, even a sub-region that represents an availability zone) I don't see any reason why the existing structures in Keystone could not be used to fulfill this functionality from the Keystone side.
The problem is *always* going to be on the Nova side (and any project that made the unfortunate decision to copy nova's availability zone "implementation" [5]... hi Cinder! [6]). The way that the availability zone concept has been hacked into Nova means it's just going to be a hack on top of a hack to get per-AZ quotas working in Nova. I know this because Oath deploys this hack on top of a hack in order to divvy up resources per power domain and physical network (those two things and the site/DC essentially comprise what our availability zones are internally).
Once again, not addressing the technical debt of years past -- which in this case is the lack of solid modeling of an availability zone concept in the Nova subsystems -- is hindering the forward progress of the project, which is sad.
The long-term solution is to have Nova scrap it's availability zone code that relies on host aggregate metadata to work, use Keystone's /regions endpoint (and the hierarchy possible therein) as the single source of truth about availability zones, move the association of availability zone out of the Service model and onto the ComputeNode model, and have a cache of real AvailabilityZone data models stored in the Nova API top-level service.
I fear that without truly tackling this underlying problem, we can make lots of progress on the Keystone side but things will slow to a crawl with people trying to figure out what the heck is going on in Nova with availability zones and how they could be tied in to quota handling.
Sorry to be a pessimist^realist, -jay
[1] https://github.com/openstack/keystone/commit/7c847578c8ed6a4921a24acb8a60f92...
[2] https://developer.openstack.org/api-ref/identity/v3/index.html#regions
[3] As I've mentioned numerous times before, there's nothing "availability" about a nova availability zone. There's no guarantees about failure domains leaking across multiple nova availability zones. Nova's availability zones are an anachronism from when a nova endpoint serviced a single isolated set of compute resources.
[4] https://developer.openstack.org/api-ref/identity/v3/index.html?expanded=crea...
[5] Behold, the availability zone implementation in Nova: https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
You'll notice there's no actual data model for an AvailabilityZone anywhere in either the nova_api or nova_cell databases:
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
Which puts "availability zone" in rare company inside Nova as one of the only data concepts that has no actual data model behind it. It shares this distinction with one other data concept... yep, you guessed it... the "region".
[6] Sorry, Cinder, you inherited the lack of a real data model for availability zones from Nova:
https://github.com/openstack/cinder/blob/b8167a5c3e5952cc52ff8844804b7a5ab36...
On 11/20/2018 10:11 AM, Tobias Urdin wrote:
I partially agree with your statements.
I'm currently rolling the ball on availability zones in our deployments and it's a real pain and I think if there was an easier concept for a source of truth for AZs (and aggregates as well) there would be less confusion and a much better forward for inter-project support for such resources.
Keep in mind that host aggregates are not something an end-user is aware of -- only operators can see information about host aggregates. This is not the same with availability zones, which the end user is aware of.
My opinion though; quota's really make sense to store in Keystone since it's the source of truth for users and projects however I would say that AZs themselves is not on such a high level and isn't the same as Keystone knowing about the regions, it's region specific.
What my point was is that Keystone's catalog can (and should) serve as the place to put regional and sub-region information. There is nothing in the Keystone concept of a region that denotes it as a geographic location nor anything in the concept of a region that prohibits a region from representing a smaller grouping of related compute resources -- a.k.a. a Nova availability zone. And if Keystone was used as the centralized catalog service that it was designed for, it just naturally makes sense to use the region and sub-region concept in Keystone's existing APIs to further partition quota limits for a project based on those regions. And there's nothing preventing anyone from creating a region in Keystone that represents a nova availability zone.
I don't think Keystone should have a view on such a detailed level inside a region, I do agree that there is a void to be filled with something that gives the source of truth on AZs, host aggregates etc though. But moving that outside of a region defeats some purpose on the whole idea to isolate it in the first place.
I haven't proposed "moving that outside of a region". Not sure where that's coming from. Could you elaborate? Best, -jay
Best regards
On 11/20/2018 03:10 PM, Jay Pipes wrote:
On 11/20/2018 07:33 AM, Lance Bragstad wrote:
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap.
Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region.
What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova?
There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs. Keystone should have always been the thing that stores region and availability zone information.
When I wrote the regions functionality for Keystone's catalog [1] I deliberately added the concept that a region can have zero or more sub-regions [2] to it. A region in Keystone wasn't (and AFAIK to this day isn't) specific to a geographic location. There's nothing preventing anyone from adding sub-regions that represent a nova availability zone [3] to Keystone.
And since there's nothing preventing the existing Keystone limits API from associating a set of limits with a specific region [4] (yes, even a sub-region that represents an availability zone) I don't see any reason why the existing structures in Keystone could not be used to fulfill this functionality from the Keystone side.
The problem is *always* going to be on the Nova side (and any project that made the unfortunate decision to copy nova's availability zone "implementation" [5]... hi Cinder! [6]). The way that the availability zone concept has been hacked into Nova means it's just going to be a hack on top of a hack to get per-AZ quotas working in Nova. I know this because Oath deploys this hack on top of a hack in order to divvy up resources per power domain and physical network (those two things and the site/DC essentially comprise what our availability zones are internally).
Once again, not addressing the technical debt of years past -- which in this case is the lack of solid modeling of an availability zone concept in the Nova subsystems -- is hindering the forward progress of the project, which is sad.
The long-term solution is to have Nova scrap it's availability zone code that relies on host aggregate metadata to work, use Keystone's /regions endpoint (and the hierarchy possible therein) as the single source of truth about availability zones, move the association of availability zone out of the Service model and onto the ComputeNode model, and have a cache of real AvailabilityZone data models stored in the Nova API top-level service.
I fear that without truly tackling this underlying problem, we can make lots of progress on the Keystone side but things will slow to a crawl with people trying to figure out what the heck is going on in Nova with availability zones and how they could be tied in to quota handling.
Sorry to be a pessimist^realist, -jay
[1] https://github.com/openstack/keystone/commit/7c847578c8ed6a4921a24acb8a60f92...
[2] https://developer.openstack.org/api-ref/identity/v3/index.html#regions
[3] As I've mentioned numerous times before, there's nothing "availability" about a nova availability zone. There's no guarantees about failure domains leaking across multiple nova availability zones. Nova's availability zones are an anachronism from when a nova endpoint serviced a single isolated set of compute resources.
[4] https://developer.openstack.org/api-ref/identity/v3/index.html?expanded=crea...
[5] Behold, the availability zone implementation in Nova: https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
You'll notice there's no actual data model for an AvailabilityZone anywhere in either the nova_api or nova_cell databases:
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
Which puts "availability zone" in rare company inside Nova as one of the only data concepts that has no actual data model behind it. It shares this distinction with one other data concept... yep, you guessed it... the "region".
[6] Sorry, Cinder, you inherited the lack of a real data model for availability zones from Nova:
https://github.com/openstack/cinder/blob/b8167a5c3e5952cc52ff8844804b7a5ab36...
On 11/20/2018 10:11 AM, Tobias Urdin wrote:
I partially agree with your statements.
I'm currently rolling the ball on availability zones in our deployments and it's a real pain and I think if there was an easier concept for a source of truth for AZs (and aggregates as well) there would be less confusion and a much better forward for inter-project support for such resources. Keep in mind that host aggregates are not something an end-user is aware of -- only operators can see information about host aggregates. This is not the same with availability zones, which the end user is aware of. My bad, keeping it to availability zones.
My opinion though; quota's really make sense to store in Keystone since it's the source of truth for users and projects however I would say that AZs themselves is not on such a high level and isn't the same as Keystone knowing about the regions, it's region specific. What my point was is that Keystone's catalog can (and should) serve as the place to put regional and sub-region information. There is nothing in the Keystone concept of a region that denotes it as a geographic location nor anything in the concept of a region that prohibits a region from representing a smaller grouping of related compute resources -- a.k.a. a Nova availability zone.
And if Keystone was used as the centralized catalog service that it was designed for, it just naturally makes sense to use the region and sub-region concept in Keystone's existing APIs to further partition quota limits for a project based on those regions. And there's nothing preventing anyone from creating a region in Keystone that represents a nova availability zone. I agree that this might be the currently best possible solution, and
On 11/20/2018 04:42 PM, Jay Pipes wrote: perhaps even the only viable solution instead of introducing more technical depth with for example an addition of more services, I just want to make sure we are aware of the dependency this would generate on Keystone which is my concern. Would your proposal be that AZs cease to exist in the current form? And instead it be replaced by a common community effort to move all existing projects to use Keystone as source of truth? I really like that concept but not convinced where we should do that. There is a great possibility for improvement here considering that Nova, Cinder and Neutron all provide the availability zone concept in, some cases, somewhat different ways. Since it's separated into the projects themselves and not moved out is causing pains to manage i.e the nova <-> cinder relationship for AZs.
I don't think Keystone should have a view on such a detailed level inside a region, I do agree that there is a void to be filled with something that gives the source of truth on AZs, host aggregates etc though. But moving that outside of a region defeats some purpose on the whole idea to isolate it in the first place. I haven't proposed "moving that outside of a region". Not sure where that's coming from. Could you elaborate?
My biggest concern is that we are moving out logic that is internal to a region out to a central dependency that keystone is. I've always viewed Keystone (and Horizon deployed alongside it) as the central part of a OpenStack cloud (yes even though we stretch databases between regions and deploy Keystone there, or do federation) which does not interact with regions except for those high-level shared resources that comes with authentication and catalog. After sleeping on it though, I understand your perspective and what I really like is that it's pretty much already there. Not sure that this would still be considered availability zones as we see them today, more of a partitioned region into sub-regions, so we move down a layer further which means you can schedule resources in a sub-region. Does that mean a gradual move away and deprecation of availability zones or is just aliasing the Keystone region concept under another name? I must say this seems like very important work, the cross-project involvement is huge but the win for all projects here sounds substantial and a community goal would probably benefit a lot of users. Best regards Tobias
Best, -jay
Best regards
On 11/20/2018 03:10 PM, Jay Pipes wrote:
On 11/20/2018 07:33 AM, Lance Bragstad wrote:
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap.
Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region.
What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova?
There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs. Keystone should have always been the thing that stores region and availability zone information.
When I wrote the regions functionality for Keystone's catalog [1] I deliberately added the concept that a region can have zero or more sub-regions [2] to it. A region in Keystone wasn't (and AFAIK to this day isn't) specific to a geographic location. There's nothing preventing anyone from adding sub-regions that represent a nova availability zone [3] to Keystone.
And since there's nothing preventing the existing Keystone limits API from associating a set of limits with a specific region [4] (yes, even a sub-region that represents an availability zone) I don't see any reason why the existing structures in Keystone could not be used to fulfill this functionality from the Keystone side.
The problem is *always* going to be on the Nova side (and any project that made the unfortunate decision to copy nova's availability zone "implementation" [5]... hi Cinder! [6]). The way that the availability zone concept has been hacked into Nova means it's just going to be a hack on top of a hack to get per-AZ quotas working in Nova. I know this because Oath deploys this hack on top of a hack in order to divvy up resources per power domain and physical network (those two things and the site/DC essentially comprise what our availability zones are internally).
Once again, not addressing the technical debt of years past -- which in this case is the lack of solid modeling of an availability zone concept in the Nova subsystems -- is hindering the forward progress of the project, which is sad.
The long-term solution is to have Nova scrap it's availability zone code that relies on host aggregate metadata to work, use Keystone's /regions endpoint (and the hierarchy possible therein) as the single source of truth about availability zones, move the association of availability zone out of the Service model and onto the ComputeNode model, and have a cache of real AvailabilityZone data models stored in the Nova API top-level service.
I fear that without truly tackling this underlying problem, we can make lots of progress on the Keystone side but things will slow to a crawl with people trying to figure out what the heck is going on in Nova with availability zones and how they could be tied in to quota handling.
Sorry to be a pessimist^realist, -jay
[1] https://github.com/openstack/keystone/commit/7c847578c8ed6a4921a24acb8a60f92...
[2] https://developer.openstack.org/api-ref/identity/v3/index.html#regions
[3] As I've mentioned numerous times before, there's nothing "availability" about a nova availability zone. There's no guarantees about failure domains leaking across multiple nova availability zones. Nova's availability zones are an anachronism from when a nova endpoint serviced a single isolated set of compute resources.
[4] https://developer.openstack.org/api-ref/identity/v3/index.html?expanded=crea...
[5] Behold, the availability zone implementation in Nova: https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
You'll notice there's no actual data model for an AvailabilityZone anywhere in either the nova_api or nova_cell databases:
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
https://github.com/openstack/nova/blob/ea26392239e67b9504801ee9a478e066ffa29...
Which puts "availability zone" in rare company inside Nova as one of the only data concepts that has no actual data model behind it. It shares this distinction with one other data concept... yep, you guessed it... the "region".
[6] Sorry, Cinder, you inherited the lack of a real data model for availability zones from Nova:
https://github.com/openstack/cinder/blob/b8167a5c3e5952cc52ff8844804b7a5ab36...
Nectar would absolutely love for this to get off the ground. We have just the one region with many AZ’s and to be able to quota on an AZ would be great. I can help in terms of use cases but we are pretty light on the developer front at the moment. We can throw money at someone else for this though I think. Cheers, Sam
On 20 Nov 2018, at 11:33 pm, Lance Bragstad <lbragstad@gmail.com> wrote:
Unified limits cropped up in a few discussions last week. After describing the current implementation of limits, namely the attributes that make up a limit, someone asked if availability zones were on the roadmap.
Ideally, it sounded like they had multiple AZs in a single region, and they wanted to be able to limit usage within the AZ. With the current implementation, regions would be the smallest unit of a deployment they could limit (e.g., limit project Foo to only using 64 compute cores within RegionOne). Instead, the idea would be to limit the number of resources for that project on an AZ within a region.
What do people think about adding availability zones to limits? Should it be an official attribute in keystone? What other services would need this outside of nova?
There were a few other interesting cases that popped up, but I figured we could start here. I can start another thread if needed and we can keep this specific to discussing limits + AZs.
Thoughts?
participants (4)
-
Jay Pipes
-
Lance Bragstad
-
Sam Morrison
-
Tobias Urdin