[openstack-dev] More on the topic of DELIMITER, the Quota Management Library proposal

Andrew Laski andrew at lascii.com
Sat Apr 23 14:26:11 UTC 2016

On Fri, Apr 22, 2016, at 09:57 PM, Tim Bell wrote:
> I have reservations on f and g.
> On f., We have had a number of discussions in the past about
> centralising quota (e.g. Boson) and the project teams of the other
> components wanted to keep the quota contents ‘close’. This can always
> be reviewed further with them but I would hope for at least a standard
> schema structure of tables in each project for the handling of quota.
> On g., aren’t all projects now nested projects ? If we have the
> complexity of handling nested projects sorted out in the common
> library, is there a reason why a project would not want to support
> nested projects ?
> One other issue is how to do reconcilliation, each project needs to
> have a mechanism to re-calculate the current allocations and reconcile
> that with the quota usage. While in an ideal world, this should not be
> necessary, it would be for the foreseeable future, especially with a
> new implementation.
One of the big reasons that Jay and I have been pushing to remove
reservations and tracking of quota in a separate place than the
resources are actually used, e.g., an instance record in the Nova db, is
so that reconciliation is not necessary. For example, if RAM quota usage
is simply tracked as sum(instances.memory_mb) then you can be sure that
usage is always up to date.
> Tim
> *From: *Amrith Kumar <amrith at tesora.com> *Reply-To: *"OpenStack
> Development Mailing List (not for usage questions)" <openstack-
> dev at lists.openstack.org> *Date: *Friday 22 April 2016 at 06:51 *To:
> *"OpenStack Development Mailing List (not for usage questions)" <openstack-
> dev at lists.openstack.org> *Subject: *Re: [openstack-dev] More on the
> topic of DELIMITER, the Quota Management Library proposal
>> I’ve thought more about Jay’s approach to enforcing quotas and I
>> think we can build on and around it. With that implementation as the
>> basic quota primitive, I think we can build a quota management API
>> that isn’t dependent on reservations. It does place some burdens on
>> the consuming projects that I had hoped to avoid and these will cause
>> heartburn for some (make sure that you always request resources in a
>> consistent order and free them in a consistent order being the most
>> obvious).
>> If it doesn’t make it harder, I would like to see if we can make the
>> quota API take care of the ordering of requests. i.e. if the quota
>> API is an extension of Jay’s example and accepts some data structure
>> (dict?) with all the claims that a project wants to make for some
>> operation, and then proceeds to make those claims for the project in
>> the consistent order, I think it would be of some value.
>> Beyond that, I’m on board with a-g below,
>> -amrith
>> *From:* Vilobh Meshram [mailto:vilobhmeshram.openstack at gmail.com]
>> *Sent:* Friday, April 22, 2016 4:08 AM *To:* OpenStack Development
>> Mailing List (not for usage questions) <openstack-
>> dev at lists.openstack.org> *Subject:* Re: [openstack-dev] More on the
>> topic of DELIMITER, the Quota Management Library proposal
>> I strongly agree with Jay on the points related to "no reservation" ,
>> keeping the interface simple and the role for Delimiter (impose
>> limits on resource consumption and enforce quotas).
>> The point to keep user quota, tenant quotas in Keystone sounds
>> interestring and would need support from Keystone team. We have a
>> Cross project session planned [1] and will definitely bring that up
>> in that session.
>> The main thought with which Delimiter was formed was to enforce
>> resource quota in transaction safe manner and do it in a cross-
>> project conducive manner and it still holds true. Delimiters mission
>> is to impose limits on resource consumption and enforce quotas in
>> transaction safe manner. Few key aspects of Delimiter are :-
>> a. Delimiter will be a new Library and not a Service. Details covered
>>    in spec.
>> b. Delimiter's role will be to impose limits on resource consumption.
>> c. Delimiter will not be responsible for rate limiting.
>> d. Delimiter will not maintain data for the resources. Respective
>>    projects will take care of keeping, maintaining data for the
>>    resources and resource consumption.
>> e. Delimiter will not have the concept of "reservations". Delimiter
>>    will read or update the "actual" resource tables and will not rely
>>    on the "cached" tables. At present, the quota infrastructure in
>>    Nova, Cinder and other projects have tables such as reservations,
>>    quota_usage, etc which are used as "cached tables" to track re
>> f. Delimiter will fetch the information for project quota, user quota
>>    from a centralized place, say Keystone, or if that doesn't
>>    materialize will fetch default quota values from respective
>>    service. This information will be cached since it gets updated
>>    rarely but read many times.
>> g. Delimiter will take into consideration whether the project is a
>>    Flat or Nested and will make the calculations of allocated,
>>    available resources. Nested means project namespace is
>>    hierarchical and Flat means project namespace is not hierarchical.
>> -Vilobh
>> [1]https://www.openstack.org/summit/austin-2016/summit-schedule/events/9492
>> On Thu, Apr 21, 2016 at 11:08 PM, Joshua Harlow
>> <harlowja at fastmail.com> wrote:
>>> Since people will be on a plane soon,
>>>  I threw this together as a example of a quota engine (the zookeeper
>>>  code does even work, and yes it provides transactional semantics
>>>  due to the nice abilities of zookeeper znode versions[1] and its
>>>  inherent consistency model, yippe).
>>> https://gist.github.com/harlowja/e7175c2d76e020a82ae94467a1441d85
>>>  Someone else can fill in the db quota engine with a
>>>  similar/equivalent api if they so dare, ha. Or even feel to say the
>>>  gist/api above is crap, cause that's ok to, lol.
>>>  [1]  https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#Data+Access
>>> Amrith Kumar wrote:
>>>> Inline below ... thread is too long, will catch you in Austin.
>>>>> -----Original Message----- From: Jay Pipes
>>>>> [mailto:jaypipes at gmail.com] Sent: Thursday, April 21, 2016 8:08 PM
>>>>> To: openstack-dev at lists.openstack.org Subject: Re: [openstack-dev]
>>>>> More on the topic of DELIMITER, the Quota Management Library
>>>>> proposal
>>>>>  Hmm, where do I start... I think I will just cut to the two
>>>>>  primary disagreements I have. And I will top-post because this
>>>>>  email is way too big.
>>>>>  1) On serializable isolation level.
>>>>>  No, you don't need it at all to prevent races in claiming. Just
>>>>>  use a compare-and-update with retries strategy. Proof is here:
>>>>> https://github.com/jaypipes/placement-bench/blob/master/placement.py#L97-
>>>>> L142
>>>>>  Works great and prevents multiple writers from oversubscribing
>>>>>  any resource without relying on any particular isolation level at
>>>>>  all.
>>>>>  The `generation` field in the inventories table is what allows
>>>>>  multiple writers to ensure a consistent view of the data without
>>>>>  needing to rely on heavy lock-based semantics and/or RDBMS-
>>>>>  specific isolation levels.
>>>> [amrith] this works for what it is doing, we can definitely do
>>>> this. This will work at any isolation level, yes. I didn't want to
>>>> go this route because it is going to still require an insert into
>>>> another table recording what the actual 'thing' is that is claiming
>>>> the resource and that insert is going to be in a different
>>>> transaction and managing those two transactions was what I wanted
>>>> to avoid. I was hoping to avoid having two tables tracking claims,
>>>> one showing the currently claimed quota and another holding the
>>>> things that claimed that quota. Have to think again whether that is
>>>> possible.
>>>>> 2) On reservations.
>>>>>  The reason I don't believe reservations are necessary to be in a
>>>>>  quota library is because reservations add a concept of a time to
>>>>>  a claim of some resource. You reserve some resource to be claimed
>>>>>  at some point in the future and release those resources at a
>>>>>  point further in time.
>>>>>  Quota checking doesn't look at what the state of some system will
>>>>>  be at some point in the future. It simply returns whether the
>>>>>  system *right now* can handle a request *right now* to claim a
>>>>>  set of resources.
>>>>>  If you want reservation semantics for some resource, that's
>>>>>  totally cool, but IMHO, a reservation service should live outside
>>>>>  of the service that is actually responsible for providing
>>>>>  resources to a consumer. Merging right-now quota checks and future-
>>>>>  based reservations into the same library just complicates things
>>>>>  unnecessarily IMHO.
>>>> [amrith] extension of the above ...
>>>>> 3) On resizes.
>>>>>  Look, I recognize some users see some value in resizing their
>>>>>  resources. That's fine. I personally think expand operations are
>>>>>  fine, and that shrink operations are really the operations that
>>>>>  should be prohibited in the API. But, whatever, I'm fine with
>>>>>  resizing of requested resource amounts. My big point is if you
>>>>>  don't have a separate table that stores quota_usages and instead
>>>>>  only have a single table that stores the actual resource usage
>>>>>  records, you don't have to do *any* quota check operations at all
>>>>>  upon deletion of a resource. For modifying resource amounts (i.e.
>>>>>  a resize) you merely need to change the calculation of requested
>>>>>  resource amounts to account for the already-consumed usage
>>>>>  amount.
>>>>>  Bottom line for me: I really won't support any proposal for a
>>>>>  complex library that takes the resource claim process out of the
>>>>>  hands of the services that own those resources. The simpler the
>>>>>  interface of this library, the better.
>>>> [amrith] my proposal would not but this email thread has got too
>>>> long. Yes, simpler interface, will catch you in Austin.
>>>>> Best, -jay
>>>>>  On 04/19/2016 09:59 PM, Amrith Kumar wrote:
>>>>>>> -----Original Message----- From: Jay Pipes
>>>>>>> [mailto:jaypipes at gmail.com] Sent: Monday, April 18, 2016 2:54 PM
>>>>>>> To: openstack-dev at lists.openstack.org Subject: Re: [openstack-
>>>>>>> dev] More on the topic of DELIMITER, the Quota Management
>>>>>>> Library proposal
>>>>>>>  On 04/16/2016 05:51 PM, Amrith Kumar wrote:
>>>>>>>> If we therefore assume that this will be a Quota Management
>>>>>>>> Library, it is safe to assume  that quotas are going to be
>>>>>>>> managed on a per-project basis, where participating projects
>>>>>>>> will use this library. I believe that it stands to reason that
>>>>>>>> any data persistence will have to be in a location decided by
>>>>>>>> the individual project.
>>>>>>> Depends on what you mean by "any data persistence". If you are
>>>>>>> referring to the storage of quota values (per user, per tenant,
>>>>>>> global, etc) I think that should be done by the Keystone
>>>>>>> service. This data is essentially an attribute of the user or
>>>>>>> the tenant or the
>>>>> service endpoint itself (i.e.
>>>>>>> global defaults). This data also rarely changes and logically
>>>>>>> belongs to the service that manages users, tenants, and service
>>>>>>> endpoints:
>>>>> Keystone.
>>>>>>> If you are referring to the storage of resource usage records,
>>>>>>> yes, each service project should own that data (and frankly, I
>>>>>>> don't see a need to persist any quota usage data at all, as I
>>>>>>> mentioned in a previous reply to Attila).
>>>>>> [amrith] You make a distinction that I had made implicitly, and
>>>>>> it is important to highlight it. Thanks for pointing it out. Yes,
>>>>>> I meant both of the above, and as stipulated. Global defaults in
>>>>>> keystone (somehow, TBD) and usage records, on a per-service
>>>>>> basis.
>>>>>>>> That may not be a very interesting statement but the corollary
>>>>>>>> is, I think, a very significant statement; it cannot be assumed
>>>>>>>> that the quota management information for all participating
>>>>>>>> projects is in the same database.
>>>>>>> It cannot be assumed that this information is even in a database
>>>>>>> at
>>>>> all...
>>>>>> [amrith] I don't follow. If the service in question is to be
>>>>>> scalable, I think it stands to reason that there must be some
>>>>>> mechanism by which instances of the service can share usage
>>>>>> records (as you refer to them, and I like that term). I think it
>>>>>> stands to reason that there must be some database, no?
>>>>>>>> A hypothetical service consuming the Delimiter library provides
>>>>>>>> requesters with some widgets, and wishes to track the widgets
>>>>>>>> that it has provisioned both on a per-user basis, and on the
>>>>>>>> whole. It should therefore multi-tenant and able to track the
>>>>>>>> widgets on a per tenant basis and if required impose limits on
>>>>>>>> the number of widgets that a tenant may consume at a time,
>>>>>>>> during a course of a period of time, and so on.
>>>>>>> No, this last part is absolutely not what I think quota
>>>>>>> management should be about.
>>>>>>>  Rate limiting -- i.e. how many requests a particular user can
>>>>>>>  make of an API in a given period of time -- should *not* be
>>>>>>>  handled by OpenStack API services, IMHO. It is the
>>>>>>>  responsibility of the deployer to handle this using off-the-
>>>>>>>  shelf rate-limiting solutions
>>>>> (open source or proprietary).
>>>>>>> Quotas should only be about the hard limit of different types of
>>>>>>> resources that a user or group of users can consume at a given
>>>>>>> time.
>>>>>> [amrith] OK, good point. Agreed as stipulated.
>>>>>>>> Such a hypothetical service may also consume resources from
>>>>>>>> other services that it wishes to track, and impose limits on.
>>>>>>> Yes, absolutely agreed.
>>>>>>>> It is also understood as Jay Pipes points out in [4] that the
>>>>>>>> actual process of provisioning widgets could be time consuming
>>>>>>>> and it is ill-advised to hold a database transaction of any
>>>>>>>> kind open for that duration of time. Ensuring that a user does
>>>>>>>> not exceed some limit on the number of concurrent widgets that
>>>>>>>> he or she may create therefore requires some mechanism to track
>>>>>>>> in-flight requests for widgets. I view these as "intent" but
>>>>>>>> not yet materialized.
>>>>>>> It has nothing to do with the amount of concurrent widgets that
>>>>>>> a user can create. It's just about the total number of some
>>>>>>> resource that may be consumed by that user.
>>>>>>>  As for an "intent", I don't believe tracking intent is the
>>>>>>>  right way to go at all. As I've mentioned before, the major
>>>>>>>  problem in Nova's quota system is that there are two tables
>>>>>>>  storing resource usage records: the *actual* resource usage
>>>>>>>  tables (the allocations table in the new resource- providers
>>>>>>>  modeling and the instance_extra, pci_devices and instances
>>>>>>>  table in the legacy modeling) and the *quota usage* tables
>>>>>>>  (quota_usages and reservations tables). The quota_usages table
>>>>>>>  does not need to exist at all, and neither does the
>>>>>>>  reservations table. Don't do intent-based consumption. Instead,
>>>>>>>  just consume (claim) by writing a record for the resource class
>>>>>>>  consumed on a provider into the actual resource usages table
>>>>>>>  and then "check quotas" by querying the *actual* resource
>>>>>>>  usages and comparing the SUM(used) values, grouped by resource
>>>>>>>  class, against the appropriate quota limits for the user. The
>>>>>>>  introduction of the quota_usages and reservations tables to
>>>>>>>  cache usage records is the primary reason for the race problems
>>>>>>>  in the Nova (and other) quota system because every time you
>>>>>>>  introduce a caching system for highly-volatile data (like usage
>>>>>>>  records) you introduce complexity into the write path and the
>>>>>>>  need to track the same thing across multiple writes to
>>>>>>>  different tables needlessly.
>>>>>> [amrith] I don't agree, I'll respond to this and the next comment
>>>>>> group
>>>>> together. See below.
>>>>>>>> Looking up at this whole infrastructure from the perspective of
>>>>>>>> the database, I think we should require that the database must
>>>>>>>> not be required to operate in any isolation mode higher than
>>>>>>>> READ-COMMITTED; more about that later (i.e. requiring a
>>>>>>>> database run either serializable or repeatable read is a show
>>>>>>>> stopper).
>>>>>>> This is an implementation detail is not relevant to the
>>>>>>> discussion about what the interface of a quota library would
>>>>>>> look like.
>>>>>> [amrith] I disagree, let me give you an example of why.
>>>>>>  Earlier, I wrote:
>>>>>>>> Such a hypothetical service may also consume resources from
>>>>>>>> other services that it wishes to track, and impose limits on.
>>>>>> And you responded:
>>>>>>> Yes, absolutely agreed.
>>>>>> So let's take this hypothetical service that in response to a
>>>>>> user
>>>>> request, will provision a Cinder volume and a Nova instance. Let's
>>>>> assume that the service also imposes limits on the number of
>>>>> cinder volumes and nova instances the user may provision;
>>>>> independent of limits that Nova and Cinder may themselves
>>>>> maintain.
>>>>>> One way that the hypothetical service can function is this:
>>>>>>  (a) check Cinder quota, if successful, create cinder volume
>>>>>>  (b) check Nova quota, if successful, create nova instance with
>>>>>>      cinder volume attachment
>>>>>>  Now, this is sub-optimal as there are going to be some number of
>>>>>>  cases
>>>>> where the nova quota check fails. Now you have needlessly created
>>>>> and will have to release a cinder volume. It also takes longer to
>>>>> fail.
>>>>>> Another way to do this is this:
>>>>>>  (1) check Cinder quota, if successful, check Nova quota, if
>>>>>>      successful proceed to (2) else error out
>>>>>>  (2) create cinder volume
>>>>>>  (3) create nova instance with cinder attachment.
>>>>>>  I'm trying to get to this latter form of doing things.
>>>>>>  Easy, you might say ... theoretically this should simply be:
>>>>>>  BEGIN; -- Get data to do the Cinder check
>>>>>>  SELECT ......
>>>>>>  -- Do the cinder check
>>>>>>  INSERT INTO ....
>>>>>>  -- Get data to do the Nova check
>>>>>>  SELECT ....
>>>>>>  -- Do the Nova check
>>>>>>  INSERT INTO ...
>>>>>>  COMMIT
>>>>>>  You can only make this work if you ran at isolation level
>>>>>>  serializable.
>>>>> Why?
>>>>>> To make this run at isolation level REPEATABLE-READ, you must
>>>>>> enforce
>>>>> constraints at the database level that will fail the commit. But
>>>>> wait, you can't do that because the data about the global limits
>>>>> may not be in the same database as the usage records. Later you
>>>>> talk about caching and stuff; all that doesn't help a database
>>>>> constraint.
>>>>>> For this reason, I think there is going to have to be some
>>>>>> cognizance to
>>>>> the database isolation level in the design of the library, and I
>>>>> think it will also impact the API that can be constructed.
>>>>>>>> In general therefore, I believe that the hypothetical service
>>>>>>>> processing requests for widgets would have to handle three
>>>>>>>> kinds of operations, provision, modify, and destroy. The names
>>>>>>>> are, I believe, self-explanatory.
>>>>>>> Generally, modification of a resource doesn't come into play.
>>>>>>> The primary exception to this is for transferring of ownership
>>>>>>> of some
>>>>> resource.
>>>>>> [amrith] Trove RESIZE is a huge benefit for users and while it
>>>>>> may be a
>>>>> pain as you say, this is still a very real benefit. Trove allows
>>>>> you to resize both your storage (resize the cinder volume) and
>>>>> resize your instance (change the flavor).
>>>>>>>> Without loss of generality, one can say that all three of them
>>>>>>>> must validate that the operation does not violate some limit
>>>>>>>> (no more than X widgets, no fewer than X widgets, rates, and so
>>>>>>>> on).
>>>>>>> No, only the creation (and very rarely the modification) needs
>>>>>>> any validation that a limit could been violated. Destroying a
>>>>>>> resource never needs to be checked for limit violations.
>>>>>> [amrith] Well, if you are going to create a volume of 10GB and
>>>>>> your
>>>>> limit is 100GB, resizing it to 200GB should fail, I think.
>>>>>>>> Assuming that the service provisions resources from other
>>>>>>>> services, it is also conceivable that limits be imposed on the
>>>>>>>> quantum of those services consumed. In practice, I can imagine
>>>>>>>> a service like Trove using the Delimiter project to perform all
>>>>>>>> of these kinds of limit checks; I'm not suggesting that it does
>>>>>>>> this today, nor that there is an immediate plan to implement
>>>>>>>> all of them, just that these all seem like good uses a Quota
>>>>>>>> Management  capability.
>>>>>>>>  - User may not have more than 25 database instances at a
>>>>> time
>>>>>>>> - User may not have more than 4 clusters at a time
>>>>>>>>  - User may not consume more than 3TB of SSD storage at a time
>>>>>>> Only if SSD storage is a distinct resource class from DISK_GB.
>>>>>>> Right now, Nova makes no differentiation w.r.t. SSD or HDD or
>>>>>>> shared vs. local block storage.
>>>>>> [amrith] It matters not to Trove whether Nova does nor not.
>>>>>> Cinder
>>>>> supports volume-types and users DO want to limit based on volume-
>>>>> type (for example).
>>>>>>>> - User may not launch more than 10 huge instances at a time
>>>>>>> What is the point of such a limit?
>>>>>> [amrith] Metering usage, placing limitations on the quantum of
>>>>>> resources
>>>>> that a user may provision. Same as with Nova. A flavor is merely a
>>>>> simple way to tie together a bag of resources. It is a way to
>>>>> restrict access, for example, to specific resources that are
>>>>> available in the cloud. HUGE is just an example I gave, pick any
>>>>> flavor you want, and here's how a service like Trove uses it.
>>>>>> Users can ask to launch an instance of a specific
>>>>>> database+version;
>>>>> MySQL 5.6-48 for example. Now, an operator can restrict the
>>>>> instance flavors, or volume types that can be associated with the
>>>>> specific datastore. And the flavor could be used to map to, for
>>>>> example whether the instance is running on bare metal or in a VM
>>>>> and if so with what kind of hardware. That's a useful construct
>>>>> for a service like Trove.
>>>>>>>> - User may not launch more than 3 clusters an hour
>>>>>>> -1. This is rate limiting and should be handled by rate-limiting
>>>>> services.
>>>>>>>> - No more than 500 copies of Oracle may be run at a time
>>>>>>> Is "Oracle" a resource class?
>>>>>> [amrith] As I view it, every project should be free to define its
>>>>>> own
>>>>> set of resource classes and meter them as it feels fit. So, while
>>>>> Oracle licenses may not, conceivably a lot of things that Nova,
>>>>> Cinder, and the other core projects don't care about, are in fact
>>>>> relevant for a consumer of this library.
>>>>>>>> While Nova would be the service that limits the number of
>>>>>>>> instances a user can have at a time, the ability for a service
>>>>>>>> to limit this further should not be underestimated.
>>>>>>>>  In turn, should Nova and Cinder also use the same Quota
>>>>>>>>  Management Library, they may each impose limitations like:
>>>>>>>>  - User may not launch more than 20 huge instances at a time
>>>>>>> Not a useful limitation IMHO.
>>>>>> [amrith] I beg to differ. Again a huge instance is just an
>>>>>> example of
>>>>> some flavor; and the idea is to allow a project to place its own
>>>>> metrics and meter based on those.
>>>>>>>> - User may not launch more than 3 instances in a minute
>>>>>>> -1. This is rate limiting.
>>>>>>>> - User may not consume more than 15TB of SSD at a time
>>>>>>>>  - User may not have more than 30 volumes at a time
>>>>>>>>  Again, I'm not implying that either Nova or Cinder should
>>>>>>>>  provide these capabilities.
>>>>>>>>  With this in mind, I believe that the minimal set of
>>>>>>>>  operations that Delimiter should provide are:
>>>>>>>>  - define_resource(name, max, min, user_max, user_min, ...)
>>>>>>> What would the above do? What service would it be speaking to?
>>>>>> [amrith] I assume that this would speak with some backend (either
>>>>> keystone or the project itself) and record these designated
>>>>> limits. This is the way to register a project specific metric like
>>>>> "Oracle licenses".
>>>>>>>> - update_resource_limits(name, user, user_max, user_min, ...)
>>>>>>> This doesn't belong in a quota library. It belongs as a REST API
>>>>>>> in Keystone.
>>>>>> [amrith] Fine, same place where the previous thing stores the
>>>>>> global
>>>>> defaults is the target of this call.
>>>>>>>> - reserve_resource(name, user, size, parent_resource, ...)
>>>>>>> This doesn't belong in a quota library at all. I think
>>>>>>> reservations are not germane to resource consumption and should
>>>>>>> be handled by an external service at the orchestration layer.
>>>>>> [amrith] Again not true, as illustrated above this library is the
>>>>>> thing
>>>>> that projects could use to determine whether or not to honor a
>>>>> request. This reserve/provision process is, I believe required
>>>>> because of the vagaries of how we want to implement this in the
>>>>> database.
>>>>>>>> - provision_resource(resource, id)
>>>>>>> A quota library should not be provisioning anything. A quota
>>>>>>> library should simply provide a consistent interface for
>>>>>>> *checking* that a structured request for some set of resources
>>>>>>> *can* be provided by the service.
>>>>>> [amrith] This does not actually call Nova or anything; merely
>>>>>> that AFTER
>>>>> the hypothetical service has called NOVA, this converts the
>>>>> reservation (which can expire) into an actual allocation.
>>>>>>>> - update_resource(id or resource, newsize)
>>>>>>> Resizing resources is a bad idea, IMHO. Resources are easier to
>>>>>>> deal with when they are considered of immutable size and simple
>>>>>>> (i.e. not complex or nested). I think the problem here is in the
>>>>>>> definition of resource classes improperly.
>>>>>> [amrith] Let's leave the quota library aside. This assertion
>>>>>> strikes at
>>>>> the very heart of things like Nova resize, or for that matter
>>>>> Cinder volume resize. Are those all bad ideas? I made a 500GB
>>>>> Cinder volume and it is getting close to full. I'd like to resize
>>>>> it to 2TB. Are you saying that's not a valid use case?
>>>>>>> For example, a "cluster" is not a resource. It is a collection
>>>>>>> of resources of type node. "Resizing" a cluster is a misnomer,
>>>>>>> because you aren't resizing a resource at all. Instead, you are
>>>>>>> creating or destroying resources inside the cluster (i.e.
>>>>>>> joining or leaving
>>>>> cluster nodes).
>>>>>>> BTW, this is also why the "resize instance" API in Nova is such
>>>>>>> a giant pain in the ass. It's attempting to "modify" the
>>>>>>> instance
>>>>> "resource"
>>>>>>> when the instance isn't really the resource at all. The VCPU,
>>>>>>> RAM_MB, DISK_GB, and PCI devices are the actual resources. The
>>>>>>> instance is a convenient way to tie those resources together,
>>>>>>> and doing a "resize" of the instance behind the scenes actually
>>>>>>> performs a *move* operation, which isn't a *change* of the
>>>>>>> original resources. Rather, it is a creation of a new set of
>>>>>>> resources (of the new amounts) and a deletion of the old set of
>>>>>>> resources.
>>>>>> [amrith] that's fine, if all we want is to handle the resize
>>>>>> operation
>>>>> as a new instance followed by a deletion, that's great. But that
>>>>> semantic isn't necessarily the case for something like (say)
>>>>> cinder.
>>>>>>> The "resize" API call adds some nasty confirmation and cancel
>>>>>>> semantics to the calling interface that hint that the underlying
>>>>>>> implementation of the "resize" operation is in actuality not a
>>>>>>> resize at all, but rather a create-new-and-delete-old-resources
>>>>>>> operation.
>>>>>> [amrith] And that isn't germane to a quota library, I don't
>>>>>> think. What
>>>>> is, is this. Do we want to treat the transient state when there
>>>>> are (for example of Nova) two instances, one of the new flavor and
>>>>> one of the old flavor, or not. But, from the perspective of a
>>>>> quota library, a resize operation is merely a reset of the quota
>>>>> by the delta in the resource consumed.
>>>>>>>> - release_resource(id or resource)
>>>>>>>>  - expire_reservations()
>>>>>>> I see no need to have reservations in the quota library at all,
>>>>>>> as mentioned above.
>>>>>> [amrith] Then I think the quota library must require that either
>>>>>> (a) the
>>>>> underlying database runs serializable or (b) database constraints
>>>>> can be used to enforce that at commit the global limits are
>>>>> adhered to.
>>>>>>> As for your proposed interface and calling structure below, I
>>>>>>> think a much simpler proposal would work better. I'll work on a
>>>>>>> cross-project spec that describes this simpler proposal, but the
>>>>>>> basics would be:
>>>>>>>  1) Have Keystone store quota information for defaults (per
>>>>>>>     service endpoint), for tenants and for users.
>>>>>>>  Keystone would have the set of canonical resource class names,
>>>>>>>  and each project, upon handling a new resource class, would be
>>>>>>>  responsible for a change submitted to Keystone to add the new
>>>>>>>  resource
>>>>> class code.
>>>>>>> Straw man REST API:
>>>>>>>  GET /quotas/resource-classes 200 OK { "resource_classes": {
>>>>>>>  "compute.vcpu": { "service": "compute", "code": "compute.vcpu",
>>>>>>>  "description": "A virtual CPU unit" }, "compute.ram_mb": {
>>>>>>>  "service": "compute", "code": "compute.ram_mb", "description":
>>>>>>>  "Memory in megabytes" }, ... "volume.disk_gb": { "service":
>>>>>>>  "volume", "code": "volume.disk_gb", "description": "Amount of
>>>>>>>  disk space in gigabytes" }, ... "database.count": { "service":
>>>>>>>  "database", "code": "database.count", "description": "Number of
>>>>>>>  database instances" } } }
>>>>>> [amrith] Well, a user is allowed to have a certain compute quota
>>>>>> (which
>>>>> is shared by Nova and Trove) but also a Trove quota. How would
>>>>> your representation represent that?
>>>>>>> # Get the default limits for new users...
>>>>>>>  GET /quotas/defaults 200 OK { "quotas": { "compute.vcpu": 100,
>>>>>>>  "compute.ram_mb": 32768, "volume.disk_gb": 1000,
>>>>>>>  "database.count": 25 } }
>>>>>>>  # Get a specific user's limits...
>>>>>>>  GET /quotas/users/{UUID} 200 OK { "quotas": { "compute.vcpu":
>>>>>>>  100, "compute.ram_mb": 32768, "volume.disk_gb": 1000,
>>>>>>>  "database.count": 25 } }
>>>>>>>  # Get a tenant's limits...
>>>>>>>  GET /quotas/tenants/{UUID} 200 OK { "quotas": { "compute.vcpu":
>>>>>>>  1000, "compute.ram_mb": 327680, "volume.disk_gb": 10000,
>>>>>>>  "database.count": 250 } }
>>>>>>>  2) Have Delimiter communicate with the above proposed new
>>>>>>>     Keystone REST API and package up data into an
>>>>>>>     oslo.versioned_objects interface.
>>>>>>>  Clearly all of the above can be heavily cached both on the
>>>>>>>  server and client side since they rarely change but are read
>>>>>>>  often.
>>>>>> [amrith] Caching on the client won't save you from
>>>>>> oversubscription if
>>>>> you don't run serializable.
>>>>>>> The Delimiter library could be used to provide a calling
>>>>>>> interface for service projects to get a user's limits for a set
>>>>>>> of resource
>>>>> classes:
>>>>>>> (please excuse wrongness, typos, and other stuff below, it's
>>>>>>> just a straw- man not production working code...)
>>>>>>>  # file: delimiter/objects/limits.py
>>>>>>>  import oslo.versioned_objects.base as ovo import
>>>>>>>  oslo.versioned_objects.fields as ovo_fields
>>>>>>>  class ResourceLimit(ovo.VersionedObjectBase):
>>>>>>>  # 1.: Initial version
>>>>>>>  VERSION = '1.0'
>>>>>>>  fields = { 'resource_class': ovo_fields.StringField(),
>>>>>>>  'amount': ovo_fields.IntegerField(), }
>>>>>>>  class ResourceLimitList(ovo.VersionedObjectBase):
>>>>>>>  # 1.: Initial version
>>>>>>>  VERSION = '1.0'
>>>>>>>  fields = { 'resources': ListOfObjectsField(ResourceLimit), }
>>>>>>>  @cache_this_heavily @remotable_classmethod def
>>>>>>>  get_all_by_user(cls, user_uuid): """Returns a Limits object
>>>>>>>  that tells the caller what a user's absolute limits for the set
>>>>>>>  of resource classes in the system. """
>>>>>>>  # Grab a keystone client session object and connect to Keystone
>>>>>>>  ks = ksclient.Session(...) raw_limits =
>>>>>>>  ksclient.get_limits_by_user() return
>>>>>>>  cls(resources=[ResourceLimit(**d) for d in raw_limits])
>>>>>>>  3) Each service project would be responsible for handling the
>>>>>>>     consumption of a set of requested resource amounts in an
>>>>>>>     atomic and
>>>>> consistent way.
>>>>>> [amrith] This is where the rubber meets the road. What is that
>>>>>> atomic
>>>>> and consistent way? And what computing infrastructure do you need
>>>>> to deliver this?
>>>>>>> The Delimiter library would return the limits that the service
>>>>>>> would pre- check before claiming the resources and either post-
>>>>>>> check after claim or utilize a compare-and-update technique with
>>>>>>> a generation/timestamp during claiming to prevent race
>>>>>>> conditions.
>>>>>>>  For instance, in Nova with the new resource providers database
>>>>>>>  schema and doing claims in the scheduler (a proposed change),
>>>>>>>  we might do something to the effect of:
>>>>>>>  from delimiter import objects as delim_obj from delimier import
>>>>>>>  exceptions as delim_exc from nova import objects as nova_obj
>>>>>>>  request = nova_obj.RequestSpec.get_by_uuid(request_uuid)
>>>>>>>  requested = request.resources limits =
>>>>>>>  delim_obj.ResourceLimitList.get_all_by_user(user_uuid)
>>>>>>>  allocations =
>>>>>>>  nova_obj.AllocationList.get_all_by_user(user_uuid)
>>>>>>>  # Pre-check for violations
>>>>>>>  for resource_class, requested_amount in requested.items():
>>>>>>>  limit_idx = limits.resources.index(resource_class)
>>>>>>>  resource_limit = limits.resources[limit_idx].amount alloc_idx =
>>>>>>>  allocations.resources.index(resource_class) resource_used =
>>>>>>>  allocations.resources[alloc_idx] if (resource_used +
>>>>>>>  requested_amount)>  resource_limit: raise
>>>>>>>  delim_exc.QuotaExceeded
>>>>>> [amrith] Is the above code run with some global mutex to prevent
>>>>>> that
>>>>> two people don't believe that they are good on quota at the same
>>>>> time?
>>>>>>> # Do claims in scheduler in an atomic, consistent fashion...
>>>>>>>  claims = scheduler_client.claim_resources(request)
>>>>>> [amrith] Yes, each 'atomic' claim on a repeatable-read database
>>>>>> could
>>>>> result in oversubscription.
>>>>>>> # Post-check for violations
>>>>>>>  allocations =
>>>>>>>  nova_obj.AllocationList.get_all_by_user(user_uuid)
>>>>>>>  # allocations now include the claimed resources from the
>>>>>>>  # scheduler
>>>>>>>  for resource_class, requested_amount in requested.items():
>>>>>>>  limit_idx = limits.resources.index(resource_class)
>>>>>>>  resource_limit = limits.resources[limit_idx].amount alloc_idx =
>>>>>>>  allocations.resources.index(resource_class) resource_used =
>>>>>>>  allocations.resources[alloc_idx] if resource_used>
>>>>>>>  resource_limit:
>>>>>>>  # Delete the allocation records for the resources just claimed
>>>>>>>  delete_resources(claims) raise delim_exc.QuotaExceeded
>>>>>> [amrith] Again, two people could drive through this code and both
>>>>>> of them could fail :(
>>>>>>> 4) The only other thing that would need to be done for a first
>>>>>>>    go of the Delimiter library is some event listener that can
>>>>>>>    listen for changes to the quota limits for a
>>>>>>>    user/tenant/default in Keystone. We'd want the services to be
>>>>>>>    able notify someone if a reduction in quota results in an
>>>>>>>    overquota situation.
>>>>>>>  Anyway, that's my idea. Keep the Delimiter library small and
>>>>>>>  focused on describing the limits only, not on the resource
>>>>>>>  allocations. Have the Delimiter library present a versioned
>>>>>>>  object interface so the interaction between the data exposed by
>>>>>>>  the Keystone REST API for quotas can evolve naturally and
>>>>>>>  smoothly over time.
>>>>>>>  Best, -jay
>>>>>>>> Let me illustrate the way I see these things fitting together.
>>>>>>>> A hypothetical Trove system may be setup as follows:
>>>>>>>>  - No more than 2000 database instances in total, 300 clusters
>>>>>>> in
>>>>>>>> total
>>>>>>>>  - Users may not launch more than 25 database instances, or 4
>>>>>>>>    clusters
>>>>>>>>  - The particular user 'amrith' is limited to 2 databases and
>>>>> 1
>>>>>>>> cluster
>>>>>>>>  - No user may consume more than 20TB of storage at a time
>>>>>>>>  - No user may consume more than 10GB of memory at a time
>>>>>>>>  At startup, I believe that the system would make the following
>>>>>>>>  sequence of calls:
>>>>>>>>  - define_resource(databaseInstance, 2000, 0, 25, 0, ...)
>>>>>>>>  - update_resource_limits(databaseInstance, amrith, 2, 0,
>>>>> ...)
>>>>>>>> - define_resource(databaseCluster, 300, 0, 4, 0, ...)
>>>>>>>>  - update_resource_limits(databaseCluster, amrith, 1, 0, ...)
>>>>>>>>  - define_resource(storage, -1, 0, 20TB, 0, ...)
>>>>>>>>  - define_resource(memory, -1, 0, 10GB, 0, ...)
>>>>>>>>  Assume that the user john comes along and asks for a cluster
>>>>>>>>  with 4 nodes, 1TB storage per node and each node having 1GB of
>>>>>>>>  memory, the system would go through the following sequence:
>>>>>>>>  - reserve_resource(databaseCluster, john, 1, None) o this
>>>>>>>>    returns a resourceID (say cluster-resource-
>>>>> ID)
>>>>>>>> o the cluster instance that it reserves counts
>>>>> against
>>>>>>>> the limit of 300 cluster instances in total, as well
>>>>> as
>>>>>>>> the 4 clusters that john can provision. If 'amrith'
>>>>> had
>>>>>>>> requested it, that would have been counted against
>>>>> the
>>>>>>>> limit of 2 clusters for the user.
>>>>>>>>  - reserve_resource(databaseInstance, john, 1, cluster-resource-
>>>>>>>>    id)
>>>>>>>>  - reserve_resource(databaseInstance, john, 1, cluster-resource-
>>>>>>>>    id)
>>>>>>>>  - reserve_resource(databaseInstance, john, 1, cluster-resource-
>>>>>>>>    id)
>>>>>>>>  - reserve_resource(databaseInstance, john, 1, cluster-resource-
>>>>>>>>    id) o this returns four resource id's, let's say instance-1-
>>>>>>>>    id,  instance-2-id, instance-3-id, instance-4-id o note that
>>>>>>>>    each instance is that, an instance by itself. it is
>>>>>>>>    therefore not right to consider this
>>>>> as
>>>>>>>> equivalent to a call to reserve_resource() with a
>>>>> size
>>>>>>>> of 4, especially because each instance could later
>>>>> be
>>>>>>>> tracked as an individual Nova instance.
>>>>>>>>  - reserve_resource(storage, john, 1TB, instance-1-id)
>>>>>>>>  - reserve_resource(storage, john, 1TB, instance-2-id)
>>>>>>>>  - reserve_resource(storage, john, 1TB, instance-3-id)
>>>>>>>>  - reserve_resource(storage, john, 1TB, instance-4-id)
>>>>>>>>  o each of them returns some resourceID, let's say
>>>>> they
>>>>>>>> returned cinder-1-id, cinder-2-id, cinder-3-id, cinder-4-id o
>>>>>>>> since the storage of 1TB is a unit, it is treated
>>>>> as
>>>>>>>> such. In other words, you don't need to invoke reserve_resource
>>>>>>>> 10^12 times, once per byte allocated
>>>>>>>>  :)
>>>>>>>>  - reserve_resource(memory, john, 1GB, instance-1-id)
>>>>>>>>  - reserve_resource(memory, john, 1GB, instance-2-id)
>>>>>>>>  - reserve_resource(memory, john, 1GB, instance-3-id)
>>>>>>>>  - reserve_resource(memory, john, 1GB, instance-4-id) o each of
>>>>>>>>    these return something, say Dg4KBQcODAENBQEGBAcEDA,
>>>>> have
>>>>>>>> made up arbitrary strings just to highlight that we really
>>>>>>>> don't track these anywhere so we don't care
>>>>>>> about
>>>>>>>> them.
>>>>>>>>  If all this works, then the system knows that John's request
>>>>>>>>  does not violate any quotas that it can enforce, it can then
>>>>>>>>  go ahead and launch the instances (calling Nova), provision
>>>>>>>>  storage, and so on.
>>>>>>>>  The system then goes and creates four Cinder volumes, these
>>>>>>>>  are cinder-1-uuid, cinder-2-uuid, cinder-3-uuid, cinder-4-
>>>>>>>>  uuid.
>>>>>>>>  It can then go and confirm those reservations.
>>>>>>>>  - provision_resource(cinder-1-id, cinder-1-uuid)
>>>>>>>>  - provision_resource(cinder-2-id, cinder-2-uuid)
>>>>>>>>  - provision_resource(cinder-3-id, cinder-3-uuid)
>>>>>>>>  - provision_resource(cinder-4-id, cinder-4-uuid)
>>>>>>>>  It could then go and launch 4 nova instances and similarly
>>>>>>>>  provision those resources, and so on. This process could take
>>>>>>>>  some minutes and holding a database transaction open for this
>>>>>>>>  is the issue that Jay brings up in [4]. We don't have to in
>>>>>>>>  this proposed scheme.
>>>>>>>>  Since the resources are all hierarchically linked through the
>>>>>>>>  overall cluster id, when the cluster is setup, it can finally
>>>>>>>>  go and provision that:
>>>>>>>>  - provision_resource(cluster-resource-id, cluster-uuid)
>>>>>>>>  When Trove is done with some individual resource, it can go
>>>>>>>>  and release it. Note that I'm thinking this will invoke
>>>>>>>>  release_resource with the ID of the underlying object OR the
>>>>>>>>  resource.
>>>>>>>>  - release_resource(cinder-4-id), and
>>>>>>>>  - release_resource(cinder-4-uuid)
>>>>>>>>  are therefore identical and indicate that the 4th 1TB volume
>>>>>>>>  is now released. How this will be implemented in Python,
>>>>>>>>  kwargs or some other mechanism is, I believe, an
>>>>>>>>  implementation detail.
>>>>>>>>  Finally, it releases the cluster resource by doing this:
>>>>>>>>  - release_resource(cluster-resource-id)
>>>>>>>>  This would release the cluster and all dependent resources in
>>>>>>>>  a single operation.
>>>>>>>>  A user may wish to manage a resource that was provisioned from
>>>>>>>>  the service. Assume that this results in a resizing of the
>>>>>>>>  instances, then it is a matter of updating that resource.
>>>>>>>>  Assume that the third 1TB volume is being resized to 2TB, then
>>>>>>>>  it is merely a matter of invoking:
>>>>>>>>  - update_resource(cinder-3-uuid, 2TB)
>>>>>>>>  Delimiter can go figure out that cinder-3-uuid is a 1TB device
>>>>>>>>  and therefore this is an increase of 1TB and verify that this
>>>>>>>>  is within the quotas allowed for the user.
>>>>>>>>  The thing that I find attractive about this model of
>>>>>>>>  maintaining a hierarchy of reservations is that in the event
>>>>>>>>  of an error, the service need merely call release_resource()
>>>>>>>>  on the highest level reservation and the Delimiter project can
>>>>>>>>  walk down the chain and release all the resources or
>>>>>>>>  reservations as appropriate.
>>>>>>>>  Under the covers I believe that each of these operations
>>>>>>>>  should be atomic and may update multiple database tables but
>>>>>>>>  these will all be short lived operations.
>>>>>>>>  For example, reserving an instance resource would increment
>>>>>>>>  the number of instances for the user as well as the number of
>>>>>>>>  instances on the whole, and this would be an atomic operation.
>>>>>>>>  I have two primary areas of concern about the proposal [3].
>>>>>>>>  The first is that it makes the implicit assumption that the
>>>>>>>>  "flat mode" is implemented. That provides value to a
>>>>> consumer
>>>>>>>> but I think it leaves a lot for the consumer to do. For
>>>>>>> example,
>>>>>>>> I find it hard to see how the model proposed would handle
>>>>> the
>>>>>>>> release of quotas, leave alone the case of a nested release of
>>>>>>> a
>>>>>>>> hierarchy of resources.
>>>>>>>>  The other is the notion that the implementation will begin a
>>>>>>>>  transaction, perform a query(), make some manipulations, and
>>>>>>>>  then do a save(). This makes for an interesting transaction
>>>>>>>>  management challenge as it would require the underlying
>>>>>>> database
>>>>>>>> to run in an isolation mode of at least repeatable reads and
>>>>>>>> maybe even serializable which would be a performance bear on
>>>>> a
>>>>>>>> heavily loaded system. If run in the traditional read-
>>>>> committed
>>>>>>>> mode, this would silently lead to over subscriptions, and
>>>>> the
>>>>>>>> violation of quota limits.
>>>>>>>>  I believe that it should be a requirement that the Delimiter
>>>>>>>>  library should be able to run against a database that
>>>>>>>>  supports, and is configured for READ-COMMITTED, and should not
>>>>>>>>  require anything higher. The model proposed above can
>>>>>>>>  certainly be implemented with a database running READ-
>>>>>>>>  COMMITTED, and I believe that this is also true with the
>>>>>>>>  caveat that the operations will be performed through
>>>>> SQLAlchemy.
>>>>>>>> Thanks,
>>>>>>>>  -amrith
>>>>>>>>  [1]  http://openstack.markmail.org/thread/tkl2jcyvzgifniux
>>>>>>>>  [2]  http://openstack.markmail.org/thread/3cr7hoeqjmgyle2j
>>>>>>>>  [3] https://review.openstack.org/#/c/284454/
>>>>>>>>  [4] http://markmail.org/message/7ixvezcsj3uyiro6
>>>>>>>>  _____________________________________________________________-
>>>>>>>>  _______
>>>>>>>>  __ ____ OpenStack Development Mailing List (not for usage
>>>>>>>>  questions) Unsubscribe: OpenStack-dev-
>>>>>>>>  request at lists.openstack.org?subject:unsubscribe
>>>>>>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>> _______________________________________________________________-
>>>>>>> ______
>>>>>>>  _____ OpenStack Development Mailing List (not for usage
>>>>>>>  questions) Unsubscribe: OpenStack-dev-
>>>>>>>  request at lists.openstack.org?subject:unsubscribe
>>>>>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>> ________________________________________________________________-
>>>>>> ______
>>>>>>  ____ OpenStack Development Mailing List (not for usage
>>>>>>  questions) Unsubscribe: OpenStack-dev-
>>>>>>  request at lists.openstack.org?subject:unsubscribe
>>>>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>> _________________________________________________________________-
>>>>> _________
>>>>>  OpenStack Development Mailing List (not for usage questions)
>>>>>  Unsubscribe:  OpenStack-dev-
>>>>>  request at lists.openstack.org?subject:unsubscribe
>>>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> __________________________________________________________________-
>>>> ________
>>>>  OpenStack Development Mailing List (not for usage questions)
>>>>  Unsubscribe:  OpenStack-dev-
>>>>  request at lists.openstack.org?subject:unsubscribe
>>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> ___________________________________________________________________-
>>> _______
>>>  OpenStack Development Mailing List (not for usage questions)
>>>  Unsubscribe:  OpenStack-dev-
>>>  request at lists.openstack.org?subject:unsubscribe
>>>  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> ____________________________________________________________________-
> ________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160423/88a34ddc/attachment.html>

More information about the OpenStack-dev mailing list