[openstack-dev] More on the topic of DELIMITER, the Quota Management Library proposal
Joshua Harlow
harlowja at fastmail.com
Mon Apr 25 19:05:33 UTC 2016
Is the generation stuff going to be exposed outside of the AP?
I'm sort of hoping not(?), because a service (if one everyone wanted to
create it) for say zookeeper (or etcd or consul...) could use its
built-in generation equivalent (every znode has a version that u can use
to do equivalent things).
Thus it's like the gist I posted earlier @
https://gist.github.com/harlowja/e7175c2d76e020a82ae94467a1441d85
So might be nice to not expose such a thing outside of the db-layer.
Amrith Kumar wrote:
> On Sat, 2016-04-23 at 21:41 +0000, Amrith Kumar wrote:
>> Ok to beer and high bandwidth. FYI Jay the distributed high perf db we
>> did a couple of years ago is now open source. Just saying. Mysql plug
>> compatible ....
>
>> -amrith
>>
>>
>> --
>> Amrith Kumar
>> amrith at tesora.com
>>
>>
>> -------- Original message --------
>> From: Jay Pipes<jaypipes at gmail.com>
>> Date: 04/23/2016 4:10 PM (GMT-05:00)
>> To: Amrith Kumar<amrith at tesora.com>,
>> openstack-dev at lists.openstack.org
>> Cc: vilobhmm at yahoo-inc.com, nik.komawar at gmail.com, Ed Leafe
>> <ed at leafe.com>
>> Subject: Re: [openstack-dev] More on the topic of DELIMITER, the Quota
>> Management Library proposal
>>
>>
>> Looking forward to arriving in Austin so that I can buy you a beer,
>> Amrith, and have a high-bandwidth conversation about how you're
>> wrong. :P
>
>
> Jay and I chatted and it took a long time to come to an agreement
> because we weren't able to find any beer.
>
> Here's what I think we've agreed about. The library will store data in
> two tables;
>
> 1. the detail table which stores the individual claims and the resource
> class.
> 2. a generation table which stores the resource class and a generation.
>
> When a claim is received, the requestor performs the following
> operations.
>
> begin
>
> select sum(detail.claims) as total_claims,
> generation.resource as resource,
> generation.generation last_generation
> from detail, generation
> where detail.resource = generation.resource
> and generation.resource =<chosen resource; memory, cpu, ...>
> group by generation.generation, generation.resource
>
> if total_claims + this_claim< limit
> insert into detail values (this_claim, resource)
>
> update generation
> set generation = generation + 1
> where generation = last_generation
>
> if @@rowcount = 1
> -- all good
> commit
> else
> rollback
> -- try again
>
>
>
> There will be some bootstrapping that will be required for the situation
> where there are no detail records for a given resource and so on but I
> think we can figure that out easily. The easiest way I can think of
> doing that is to lose the join and do both the queries (one against the
> detail and one against the generation table within the same
> transaction).
>
> Using the generation table update as the locking mechanism that prevents
> multiple requestors from making concurrent claims.
>
> So long as people don't go and try and read these tables and change
> tables outside of the methods that the library provides, we can
> guarantee that this is al safe and will not oversubscribe.
>
> -amrith
>
>> Comments inline.
>>
>> On 04/23/2016 11:25 AM, Amrith Kumar wrote:
>>> On Sat, 2016-04-23 at 10:26 -0400, Andrew Laski wrote:
>>>> On Fri, Apr 22, 2016, at 09:57 PM, Tim Bell wrote:
>>>>
>>>>> I have reservations on f and g.
>>>>>
>>>>>
>>>>> On f., We have had a number of discussions in the past about
>>>>> centralising quota (e.g. Boson) and the project teams of the other
>>>>> components wanted to keep the quota contents ‘close’. This can
>>>>> always be reviewed further with them but I would hope for at least
>> a
>>>>> standard schema structure of tables in each project for the
>> handling
>>>>> of quota.
>>>>>
>>>>>
>>>>> On g., aren’t all projects now nested projects ? If we have the
>>>>> complexity of handling nested projects sorted out in the common
>>>>> library, is there a reason why a project would not want to support
>>>>> nested projects ?
>>>>>
>>>>>
>>>>> One other issue is how to do reconcilliation, each project needs
>> to
>>>>> have a mechanism to re-calculate the current allocations and
>>>>> reconcile that with the quota usage. While in an ideal world, this
>>>>> should not be necessary, it would be for the foreseeable future,
>>>>> especially with a new implementation.
>>>>>
>>>> One of the big reasons that Jay and I have been pushing to remove
>>>> reservations and tracking of quota in a separate place than the
>>>> resources are actually used, e.g., an instance record in the Nova
>> db,
>>>> is so that reconciliation is not necessary. For example, if RAM
>> quota
>>>> usage is simply tracked as sum(instances.memory_mb) then you can be
>>>> sure that usage is always up to date.
>>> Uh oh, there be gremlins here ...
>>>
>>> I am positive that this will NOT work, see earlier conversations
>> about
>>> isolation levels, and Jay's alternate solution.
>>>
>>> The way (I understand the issue, and Jay's solution) you get around
>> the
>>> isolation levels trap is to NOT do your quota determinations based
>> on a
>>> SUM(column) but rather based on the rowcount on a well crafted
>> UPDATE of
>>> a single table that stored total quota.
>> No, we would do our quota calculations by doing a SUM(used) against
>> the
>> allocations table. There is no separate table that stored the total
>> quota (or quota usage records). That's the source of the problem with
>> the existing quota handling code in Nova. The generation field value
>> is
>> used to provide the consistent view of the actual resource usage
>> records
>> so that the INSERT operations for all claimed resources can be done in
>> a
>> transactional manner and will be rolled back if any other writer
>> changes
>> the amount of consumed resources on a provider (which of course would
>> affect the quota check calculations).
>>
>> > You could also store a detail
>>> claim record for each claim in an independent table that is
>> maintained
>>> in the same database transaction if you so desire, that is optional.
>> The allocations table is the "detail claim record" table that you
>> refer
>> to above.
>>
>>> My view of how this would work (which I described earlier as
>> building on
>>> Jay's solution) is that the claim flow would look like this:
>>>
>>> select total_used, generation
>>> from quota_claimed
>>> where tenant =<tenant> and resource = 'memory'
>> There is no need to keep a total_used value for anything. That is
>> denormalized calculated data that merely adds a point of race
>> contention. The quota check is against the *detail* table
>> (allocations),
>> which stores the *actual resource usage records*.
>>
>>> begin transaction
>>>
>>> update quota_claimed
>>> set total_used = total_used + claim, generation =
>>> generation + 1
>>> where tenant =<tenant> and resource = 'memory'
>>> and generation = generation
>>> and total_used + claim< limit
>> This part of the transaction must always occur **after** the
>> insertion
>> of the actual resource records, not before.
>>
>>> if @@rowcount = 1
>>> -- optional claim_detail table
>>> insert into claim_detail values (<tenant>,
>> 'memory',
>>> claim, ...)
>>> commit
>>> else
>>> rollback
>> So, in pseudo-Python-SQLish code, my solution works like this:
>>
>> limits = get_limits_from_delimiter()
>> requested = get_requested_from_request_spec()
>>
>> while True:
>>
>> used := SELECT
>> resource_class,
>> resource_provider,
>> generation,
>> SUM(used) as total_used
>> FROM allocations
>> JOIN resource_providers ON (...)
>> WHERE consumer_uuid = $USER_UUID
>> GROUP BY
>> resource_class,
>> resource_provider,
>> generation;
>>
>> # Check that our requested resource amounts don't exceed quotas
>> if not check_requested_within_limits(requested, used, limits):
>> raise QuotaExceeded
>>
>> # Claim all requested resources. Note that the generation
>> retrieved
>> # from the above query is our consistent view marker. If the
>> UPDATE
>> # below succeeds and returns != 0 rows affected, that means there
>> # was no other writer that changed our resource usage in between
>> # this thread's claiming of resources, and therefore we prevent
>> # any oversubscription of resources.
>> begin_transaction:
>>
>> provider := SELECT id, generation, ... FROM
>> resource_providers
>> JOIN (...)
>> WHERE (<resource_usage_filters>)
>>
>> for resource in requested:
>> INSERT INTO allocations (
>> resource_provider_id,
>> resource_class_id,
>> consumer_uuid,
>> used
>> ) VALUES (
>> $provider.id,
>> $resource.id,
>> $USER_UUID,
>> $resource.amount
>> );
>>
>> rows_affected := UPDATE resource_providers
>> SET generation = generation + 1
>> WHERE id = $provider.id
>> AND generation =
>> $used[$provider.id].generation;
>>
>> if $rows_affected == 0:
>> ROLLBACK;
>>
>> The only reason we would need a post-claim quota check is if some of
>> the
>> requested resources are owned and tracked by an external-to-Nova
>> system.
>>
>> BTW, note to Ed Leafe... unless your distributed data store supports
>> transactional semantics, you can't use a distributed data store for
>> these types of solutions. Instead, you will need to write a whole
>> bunch
>> of code that does post-auditing of claims and quotas and a system
>> that
>> accepts that oversubscription and out-of-sync quota limits and usages
>> is
>> a fact of life. Not to mention needing to implement JOINs in Python.
>>
>>> But, it is my understanding that
>>>
>>> (a) if you wish to do the SUM(column) approach that you
>> propose,
>>> you must have a reservation that is committed and then you
>> must
>>> re-read the SUM(column) to make sure you did not
>> over-subscribe;
>>> and
>> Erm, kind of? Oversubscription is not possible in the solution I
>> describe because the compare-and-update on the
>> resource_providers.generation field allows for a consistent view of
>> the
>> resources used -- and if that view changes during the insertion of
>> resource usage records -- the transaction containing those insertions
>> is
>> rolled back.
>>
>>> (b) to get away from reservations you must stop using the
>>> SUM(column) approach and instead use a single quota_claimed
>>> table to determine the current quota claimed.
>> No. This has nothing to do with reservations.
>>
>>> At least that's what I understand of Jay's example from earlier in
>> this
>>> thread.
>>>
>>> Let's definitely discuss this in Austin. While I don't love Jay's
>>> solution for other reasons to do with making the quota table a
>> hotspot
>>> and things like that, it is a perfectly workable solution, I think.
>> There is no quota table in my solution.
>>
>> If you refer to the resource_providers table (the table that has the
>> generation field), then yes, it's a hot spot. But hot spots in the DB
>> aren't necessarily a bad thing if you design the underlying schema
>> properly.
>>
>> More in Austin.
>>
>> Best,
>> -jay
>>
>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> From: Amrith Kumar<amrith at tesora.com>
>>>>> Reply-To: "OpenStack Development Mailing List (not for usage
>>>>> questions)"<openstack-dev at lists.openstack.org>
>>>>> Date: Friday 22 April 2016 at 06:51
>>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>>> <openstack-dev at lists.openstack.org>
>>>>> Subject: Re: [openstack-dev] More on the topic of DELIMITER, the
>>>>> Quota Management Library proposal
>>>>>
>>>>>
>>>>>
>>>>> I’ve thought more about Jay’s approach to enforcing
>> quotas
>>>>> and I think we can build on and around it. With that
>>>>> implementation as the basic quota primitive, I think we
>> can
>>>>> build a quota management API that isn’t dependent on
>>>>> reservations. It does place some burdens on the consuming
>>>>> projects that I had hoped to avoid and these will cause
>>>>> heartburn for some (make sure that you always request
>>>>> resources in a consistent order and free them in a
>>>>> consistent order being the most obvious).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If it doesn’t make it harder, I would like to see if we
>> can
>>>>> make the quota API take care of the ordering of requests.
>>>>> i.e. if the quota API is an extension of Jay’s example
>> and
>>>>> accepts some data structure (dict?) with all the claims
>> that
>>>>> a project wants to make for some operation, and then
>>>>> proceeds to make those claims for the project in the
>>>>> consistent order, I think it would be of some value.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Beyond that, I’m on board with a-g below,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -amrith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> From: Vilobh Meshram
>>>>> [mailto:vilobhmeshram.openstack at gmail.com]
>>>>> Sent: Friday, April 22, 2016 4:08 AM
>>>>> To: OpenStack Development Mailing List (not for usage
>>>>> questions)<openstack-dev at lists.openstack.org>
>>>>> Subject: Re: [openstack-dev] More on the topic of
>> DELIMITER,
>>>>> the Quota Management Library proposal
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I strongly agree with Jay on the points related to "no
>>>>> reservation" , keeping the interface simple and the role
>> for
>>>>> Delimiter (impose limits on resource consumption and
>> enforce
>>>>> quotas).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The point to keep user quota, tenant quotas in Keystone
>>>>> sounds interestring and would need support from Keystone
>>>>> team. We have a Cross project session planned [1] and
>> will
>>>>> definitely bring that up in that session.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The main thought with which Delimiter was formed was to
>>>>> enforce resource quota in transaction safe manner and do
>> it
>>>>> in a cross-project conducive manner and it still holds
>>>>> true. Delimiters mission is to impose limits on
>>>>> resource consumption and enforce quotas in transaction
>> safe
>>>>> manner. Few key aspects of Delimiter are :-
>>>>>
>>>>>
>>>>>
>>>>> a. Delimiter will be a new Library and not a Service.
>>>>> Details covered in spec.
>>>>>
>>>>>
>>>>> b. Delimiter's role will be to impose limits on resource
>>>>> consumption.
>>>>>
>>>>>
>>>>> c. Delimiter will not be responsible for rate limiting.
>>>>>
>>>>>
>>>>> d. Delimiter will not maintain data for the resources.
>>>>> Respective projects will take care of keeping,
>> maintaining
>>>>> data for the resources and resource consumption.
>>>>>
>>>>>
>>>>> e. Delimiter will not have the concept of "reservations".
>>>>> Delimiter will read or update the "actual" resource
>> tables
>>>>> and will not rely on the "cached" tables. At present, the
>>>>> quota infrastructure in Nova, Cinder and other projects
>> have
>>>>> tables such as reservations, quota_usage, etc which are
>> used
>>>>> as "cached tables" to track re
>>>>>
>>>>>
>>>>> f. Delimiter will fetch the information for project
>> quota,
>>>>> user quota from a centralized place, say Keystone, or if
>>>>> that doesn't materialize will fetch default quota values
>>>>> from respective service. This information will be cached
>>>>> since it gets updated rarely but read many times.
>>>>>
>>>>>
>>>>> g. Delimiter will take into consideration whether the
>>>>> project is a Flat or Nested and will make the
>> calculations
>>>>> of allocated, available resources. Nested means project
>>>>> namespace is hierarchical and Flat means project
>> namespace
>>>>> is not hierarchical.
>>>>>
>>>>>
>>>>> -Vilobh
>>>>>
>>>>>
>>>>> [1]
>> https://www.openstack.org/summit/austin-2016/summit-schedule/events/9492
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 21, 2016 at 11:08 PM, Joshua Harlow
>>>>> <harlowja at fastmail.com> wrote:
>>>>>
>>>>>
>>>>> Since people will be on a plane soon,
>>>>>
>>>>> I threw this together as a example of a quota
>> engine
>>>>> (the zookeeper code does even work, and yes it
>>>>> provides transactional semantics due to the nice
>>>>> abilities of zookeeper znode versions[1] and its
>>>>> inherent consistency model, yippe).
>>>>>
>>>>>
>> https://gist.github.com/harlowja/e7175c2d76e020a82ae94467a1441d85
>>>>> Someone else can fill in the db quota engine with
>> a
>>>>> similar/equivalent api if they so dare, ha. Or
>> even
>>>>> feel to say the gist/api above is crap, cause
>> that's
>>>>> ok to, lol.
>>>>>
>>>>> [1]
>>>>>
>> https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#Data
>> +Access
>>>>>
>>>>>
>>>>> Amrith Kumar wrote:
>>>>>
>>>>> Inline below ... thread is too long, will
>>>>> catch you in Austin.
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jay Pipes
>>>>> [mailto:jaypipes at gmail.com]
>>>>> Sent: Thursday, April 21, 2016
>> 8:08
>>>>> PM
>>>>> To:
>>>>> openstack-dev at lists.openstack.org
>>>>> Subject: Re: [openstack-dev] More
>> on
>>>>> the topic of DELIMITER, the Quota
>>>>> Management Library proposal
>>>>>
>>>>> Hmm, where do I start... I think
>> I
>>>>> will just cut to the two primary
>>>>> disagreements I have. And I will
>>>>> top-post because this email is
>> way
>>>>> too
>>>>> big.
>>>>>
>>>>> 1) On serializable isolation
>> level.
>>>>> No, you don't need it at all to
>>>>> prevent races in claiming. Just
>> use
>>>>> a
>>>>> compare-and-update with retries
>>>>> strategy. Proof is here:
>>>>>
>>>>>
>> https://github.com/jaypipes/placement-bench/blob/master/placement.py#L97-
>>>>> L142
>>>>>
>>>>> Works great and prevents multiple
>>>>> writers from oversubscribing any
>>>>> resource without relying on any
>>>>> particular isolation level at
>> all.
>>>>> The `generation` field in the
>>>>> inventories table is what allows
>>>>> multiple
>>>>> writers to ensure a consistent
>> view
>>>>> of the data without needing to
>> rely
>>>>> on
>>>>> heavy lock-based semantics and/or
>>>>> RDBMS-specific isolation levels.
>>>>>
>>>>>
>>>>>
>>>>> [amrith] this works for what it is doing,
>> we
>>>>> can definitely do this. This will work at
>>>>> any isolation level, yes. I didn't want
>> to
>>>>> go this route because it is going to
>> still
>>>>> require an insert into another table
>>>>> recording what the actual 'thing' is that
>> is
>>>>> claiming the resource and that insert is
>>>>> going to be in a different transaction
>> and
>>>>> managing those two transactions was what
>> I
>>>>> wanted to avoid. I was hoping to avoid
>>>>> having two tables tracking claims, one
>>>>> showing the currently claimed quota and
>>>>> another holding the things that claimed
>> that
>>>>> quota. Have to think again whether that
>> is
>>>>> possible.
>>>>>
>>>>> 2) On reservations.
>>>>>
>>>>> The reason I don't believe
>>>>> reservations are necessary to be
>> in
>>>>> a quota
>>>>> library is because reservations
>> add
>>>>> a concept of a time to a claim of
>>>>> some
>>>>> resource. You reserve some
>> resource
>>>>> to be claimed at some point in
>> the
>>>>> future and release those
>> resources
>>>>> at a point further in time.
>>>>>
>>>>> Quota checking doesn't look at
>> what
>>>>> the state of some system will be
>> at
>>>>> some point in the future. It
>> simply
>>>>> returns whether the system *right
>>>>> now* can handle a request *right
>>>>> now* to claim a set of resources.
>>>>>
>>>>> If you want reservation semantics
>>>>> for some resource, that's totally
>>>>> cool,
>>>>> but IMHO, a reservation service
>>>>> should live outside of the
>> service
>>>>> that is
>>>>> actually responsible for
>> providing
>>>>> resources to a consumer.
>>>>> Merging right-now quota checks
>> and
>>>>> future-based reservations into
>> the
>>>>> same
>>>>> library just complicates things
>>>>> unnecessarily IMHO.
>>>>>
>>>>>
>>>>>
>>>>> [amrith] extension of the above ...
>>>>>
>>>>> 3) On resizes.
>>>>>
>>>>> Look, I recognize some users see
>>>>> some value in resizing their
>>>>> resources.
>>>>> That's fine. I personally think
>>>>> expand operations are fine, and
>> that
>>>>> shrink operations are really the
>>>>> operations that should be
>> prohibited
>>>>> in
>>>>> the API. But, whatever, I'm fine
>>>>> with resizing of requested
>> resource
>>>>> amounts. My big point is if you
>>>>> don't have a separate table that
>>>>> stores
>>>>> quota_usages and instead only
>> have a
>>>>> single table that stores the
>> actual
>>>>> resource usage records, you don't
>>>>> have to do *any* quota check
>>>>> operations
>>>>> at all upon deletion of a
>> resource.
>>>>> For modifying resource amounts
>> (i.e.
>>>>> a
>>>>> resize) you merely need to change
>>>>> the calculation of requested
>>>>> resource
>>>>> amounts to account for the
>>>>> already-consumed usage amount.
>>>>>
>>>>> Bottom line for me: I really
>> won't
>>>>> support any proposal for a
>> complex
>>>>> library that takes the resource
>>>>> claim process out of the hands of
>>>>> the
>>>>> services that own those
>> resources.
>>>>> The simpler the interface of this
>>>>> library, the better.
>>>>>
>>>>>
>>>>>
>>>>> [amrith] my proposal would not but this
>>>>> email thread has got too long. Yes,
>> simpler
>>>>> interface, will catch you in Austin.
>>>>>
>>>>> Best,
>>>>> -jay
>>>>>
>>>>> On 04/19/2016 09:59 PM, Amrith
>> Kumar
>>>>> wrote:
>>>>>
>>>>> -----Original
>>>>> Message-----
>>>>> From: Jay Pipes
>>>>>
>> [mailto:jaypipes at gmail.com]
>>>>> Sent: Monday,
>> April
>>>>> 18, 2016 2:54 PM
>>>>> To:
>>>>>
>> openstack-dev at lists.openstack.org
>>>>> Subject: Re:
>>>>> [openstack-dev]
>> More
>>>>> on the topic of
>>>>> DELIMITER, the
>>>>> Quota Management
>>>>> Library proposal
>>>>>
>>>>> On 04/16/2016
>> 05:51
>>>>> PM, Amrith Kumar
>>>>> wrote:
>>>>>
>>>>> If we
>>>>> therefore
>>>>> assume
>> that
>>>>> this will
>> be
>>>>> a Quota
>>>>>
>> Management
>>>>> Library,
>>>>> it is
>> safe
>>>>> to assume
>>>>> that
>> quotas
>>>>> are going
>> to
>>>>> be
>> managed
>>>>> on a
>>>>>
>> per-project
>>>>> basis,
>> where
>> participating projects will use this library.
>>>>> I believe
>>>>> that it
>>>>> stands to
>>>>> reason
>> that
>>>>> any data
>>>>>
>> persistence
>>>>> will
>>>>> have to
>> be
>>>>> in a
>>>>> location
>>>>> decided
>> by
>>>>> the
>>>>>
>> individual
>>>>> project.
>>>>>
>>>>>
>>>>> Depends on what
>> you
>>>>> mean by "any data
>>>>> persistence". If
>> you
>>>>> are
>>>>> referring to the
>>>>> storage of quota
>>>>> values (per user,
>>>>> per tenant,
>>>>> global, etc) I
>> think
>>>>> that should be
>> done
>>>>> by the Keystone
>>>>> service.
>>>>> This data is
>>>>> essentially an
>>>>> attribute of the
>>>>> user or the
>> tenant
>>>>> or the
>>>>>
>>>>>
>>>>> service endpoint itself (i.e.
>>>>>
>>>>>
>>>>> global defaults).
>>>>> This data also
>>>>> rarely changes
>> and
>>>>> logically belongs
>>>>> to the service
>> that
>>>>> manages users,
>>>>> tenants, and
>> service
>>>>> endpoints:
>>>>>
>>>>>
>>>>> Keystone.
>>>>>
>>>>>
>>>>> If you are
>> referring
>>>>> to the storage of
>>>>> resource usage
>>>>> records, yes,
>>>>> each service
>> project
>>>>> should own that
>> data
>>>>> (and frankly, I
>>>>> don't see a
>>>>> need to persist
>> any
>>>>> quota usage data
>> at
>>>>> all, as I
>> mentioned
>>>>> in a
>>>>> previous reply to
>>>>> Attila).
>>>>>
>>>>>
>>>>> [amrith] You make a
>>>>> distinction that I had
>> made
>>>>> implicitly, and it is
>>>>> important to highlight
>> it.
>>>>> Thanks for pointing it
>> out.
>>>>> Yes, I meant
>>>>> both of the above, and as
>>>>> stipulated. Global
>> defaults
>>>>> in keystone
>>>>> (somehow, TBD) and usage
>>>>> records, on a per-service
>>>>> basis.
>>>>>
>>>>> That may
>> not
>>>>> be a very
>>>>>
>> interesting
>>>>> statement
>>>>> but the
>>>>> corollary
>>>>> is, I
>>>>> think, a
>>>>> very
>>>>>
>> significant
>> statement;
>>>>> it cannot
>> be
>>>>> assumed
>> that
>>>>> the
>>>>> quota
>>>>>
>> management
>> information
>>>>> for all
>>>>>
>> participating projects is in
>>>>> the same
>>>>> database.
>>>>>
>>>>>
>>>>> It cannot be
>> assumed
>>>>> that this
>>>>> information is
>> even
>>>>> in a database at
>>>>>
>>>>>
>>>>>
>>>>> all...
>>>>>
>>>>>
>>>>> [amrith] I don't follow.
>> If
>>>>> the service in question
>> is
>>>>> to be scalable,
>>>>> I think it stands to
>> reason
>>>>> that there must be some
>>>>> mechanism by which
>>>>> instances of the service
>> can
>>>>> share usage records (as
>> you
>>>>> refer to
>>>>> them, and I like that
>> term).
>>>>> I think it stands to
>> reason
>>>>> that there
>>>>> must be some database,
>> no?
>>>>> A
>>>>>
>> hypothetical
>>>>> service
>>>>> consuming
>>>>> the
>>>>> Delimiter
>>>>> library
>>>>> provides
>>>>>
>> requesters
>>>>> with some
>>>>> widgets,
>> and
>>>>> wishes to
>>>>> track the
>>>>> widgets
>> that
>>>>> it has
>>>>>
>> provisioned
>>>>> both on a
>>>>> per-user
>>>>> basis,
>> and
>>>>> on the
>>>>> whole. It
>>>>> should
>>>>> therefore
>>>>>
>> multi-tenant
>>>>> and able
>> to
>>>>> track the
>>>>> widgets
>> on a
>>>>> per
>>>>> tenant
>> basis
>>>>> and if
>>>>> required
>>>>> impose
>>>>> limits on
>>>>> the
>> number
>>>>> of
>> widgets
>>>>> that a
>>>>> tenant
>> may
>>>>> consume
>> at a
>>>>> time,
>> during
>>>>> a course
>> of
>>>>> a period
>> of
>>>>> time, and
>> so
>>>>> on.
>>>>>
>>>>>
>>>>> No, this last
>> part
>>>>> is absolutely not
>>>>> what I think
>> quota
>>>>> management
>>>>> should be about.
>>>>>
>>>>> Rate limiting --
>>>>> i.e. how many
>>>>> requests a
>>>>> particular user
>> can
>>>>> make of
>>>>> an API in a given
>>>>> period of time --
>>>>> should *not* be
>>>>> handled by
>>>>> OpenStack API
>>>>> services, IMHO.
>> It
>>>>> is the
>>>>> responsibility of
>>>>> the
>>>>> deployer to
>> handle
>>>>> this using
>>>>> off-the-shelf
>>>>> rate-limiting
>>>>> solutions
>>>>>
>>>>>
>>>>> (open source or proprietary).
>>>>>
>>>>>
>>>>> Quotas should
>> only
>>>>> be about the hard
>>>>> limit of
>> different
>>>>> types of
>>>>> resources that a
>>>>> user or group of
>>>>> users can consume
>> at
>>>>> a given time.
>>>>>
>>>>>
>>>>> [amrith] OK, good point.
>>>>> Agreed as stipulated.
>>>>>
>>>>>
>>>>> Such a
>>>>>
>> hypothetical
>>>>> service
>> may
>>>>> also
>> consume
>>>>> resources
>>>>> from
>> other
>>>>> services
>>>>> that it
>>>>> wishes to
>>>>> track,
>> and
>>>>> impose
>>>>> limits
>> on.
>>>>>
>>>>> Yes, absolutely
>>>>> agreed.
>>>>>
>>>>>
>>>>> It is
>> also
>> understood
>>>>> as Jay
>> Pipes
>>>>> points
>> out
>>>>> in [4]
>> that
>>>>> the
>> actual
>>>>> process
>> of
>> provisioning
>>>>> widgets
>>>>> could be
>>>>> time
>>>>> consuming
>>>>> and it is
>>>>>
>> ill-advised
>>>>> to hold a
>>>>> database
>>>>>
>> transaction
>>>>> of any
>> kind
>>>>> open for
>>>>> that
>>>>> duration
>> of
>>>>> time.
>>>>> Ensuring
>>>>> that a
>> user
>>>>> does not
>>>>> exceed
>> some
>>>>> limit on
>>>>> the
>> number
>>>>> of
>>>>>
>> concurrent
>>>>> widgets
>> that
>>>>> he or she
>>>>> may
>> create
>>>>> therefore
>>>>> requires
>>>>> some
>>>>> mechanism
>> to
>>>>> track
>>>>> in-flight
>>>>> requests
>> for
>>>>> widgets.
>> I
>>>>> view
>> these
>>>>> as
>> "intent"
>>>>> but not
>> yet
>> materialized.
>>>>>
>>>>> It has nothing to
>> do
>>>>> with the amount
>> of
>>>>> concurrent
>> widgets
>>>>> that a
>>>>> user can create.
>>>>> It's just about
>> the
>>>>> total number of
>> some
>>>>> resource
>>>>> that may be
>> consumed
>>>>> by that user.
>>>>>
>>>>> As for an
>> "intent",
>>>>> I don't believe
>>>>> tracking intent
>> is
>>>>> the right way
>>>>> to go at all. As
>>>>> I've mentioned
>>>>> before, the major
>>>>> problem in Nova's
>>>>> quota system is
>> that
>>>>> there are two
>> tables
>>>>> storing resource
>>>>> usage
>>>>> records: the
>>>>> *actual* resource
>>>>> usage tables (the
>>>>> allocations table
>> in
>>>>> the new
>>>>> resource-
>> providers
>>>>> modeling and the
>>>>> instance_extra,
>>>>> pci_devices and
>>>>> instances table
>> in
>>>>> the legacy
>> modeling)
>>>>> and the *quota
>>>>> usage* tables
>>>>> (quota_usages and
>>>>> reservations
>>>>> tables). The
>>>>> quota_usages
>> table
>>>>> does
>>>>> not need to exist
>> at
>>>>> all, and neither
>>>>> does the
>>>>> reservations
>> table.
>>>>> Don't do
>>>>> intent-based
>>>>> consumption.
>>>>> Instead, just
>>>>> consume (claim)
>> by
>>>>> writing a record
>> for
>>>>> the resource
>> class
>>>>> consumed on a
>>>>> provider into
>>>>> the actual
>> resource
>>>>> usages table and
>>>>> then "check
>> quotas"
>>>>> by querying
>>>>> the *actual*
>>>>> resource usages
>> and
>>>>> comparing the
>>>>> SUM(used) values,
>>>>> grouped by
>> resource
>>>>> class, against
>> the
>>>>> appropriate quota
>>>>> limits for
>>>>> the user. The
>>>>> introduction of
>> the
>>>>> quota_usages and
>>>>> reservations
>>>>> tables to cache
>>>>> usage records is
>> the
>>>>> primary reason
>> for
>>>>> the race
>>>>> problems in the
>> Nova
>>>>> (and
>>>>> other) quota
>> system
>>>>> because every
>> time
>>>>> you introduce a
>>>>> caching system
>>>>> for
>> highly-volatile
>>>>> data (like usage
>>>>> records) you
>>>>> introduce
>>>>> complexity into
>> the
>>>>> write path and
>> the
>>>>> need to track the
>>>>> same thing
>>>>> across multiple
>>>>> writes to
>> different
>>>>> tables
>> needlessly.
>>>>>
>>>>> [amrith] I don't agree,
>> I'll
>>>>> respond to this and the
>> next
>>>>> comment group
>>>>>
>>>>>
>>>>>
>>>>> together. See below.
>>>>>
>>>>>
>>>>> Looking
>> up
>>>>> at this
>>>>> whole
>>>>>
>> infrastructure from the perspective of the
>>>>> database,
>> I
>>>>> think we
>>>>> should
>>>>> require
>> that
>>>>> the
>> database
>>>>> must not
>> be
>>>>> required
>> to
>>>>> operate
>> in
>>>>> any
>>>>> isolation
>>>>> mode
>> higher
>>>>> than
>>>>>
>> READ-COMMITTED; more about that later (i.e. requiring a database run
>>>>> either
>>>>>
>> serializable
>>>>> or
>>>>>
>> repeatable
>>>>> read is a
>>>>> show
>>>>> stopper).
>>>>>
>>>>>
>>>>> This is an
>>>>> implementation
>>>>> detail is not
>>>>> relevant to the
>>>>> discussion
>>>>> about what the
>>>>> interface of a
>> quota
>>>>> library would
>> look
>>>>> like.
>>>>>
>>>>>
>>>>> [amrith] I disagree, let
>> me
>>>>> give you an example of
>> why.
>>>>> Earlier, I wrote:
>>>>>
>>>>> Such a
>>>>>
>> hypothetical
>>>>> service
>> may
>>>>> also
>> consume
>>>>> resources
>>>>> from
>> other
>>>>> services
>>>>> that it
>>>>> wishes to
>>>>> track,
>> and
>>>>> impose
>>>>> limits
>> on.
>>>>>
>>>>> And you responded:
>>>>>
>>>>>
>>>>> Yes, absolutely
>>>>> agreed.
>>>>>
>>>>>
>>>>>
>>>>> So let's take this
>>>>> hypothetical service that
>> in
>>>>> response to a user
>>>>>
>>>>>
>>>>>
>>>>> request, will provision a Cinder
>>>>> volume and a Nova instance. Let's
>>>>> assume
>>>>> that the service also imposes
>> limits
>>>>> on the number of cinder volumes
>> and
>>>>> nova instances the user may
>>>>> provision; independent of limits
>>>>> that Nova and
>>>>> Cinder may themselves maintain.
>>>>>
>>>>> One way that the
>>>>> hypothetical service can
>>>>> function is this:
>>>>>
>>>>> (a) check Cinder quota,
>> if
>>>>> successful, create cinder
>>>>> volume
>>>>> (b) check Nova quota, if
>>>>> successful, create nova
>>>>> instance with cinder
>>>>> volume attachment
>>>>>
>>>>> Now, this is sub-optimal
>> as
>>>>> there are going to be
>> some
>>>>> number of cases
>>>>>
>>>>>
>>>>> where the nova quota check fails.
>>>>> Now you have needlessly created
>> and
>>>>> will
>>>>> have to release a cinder volume.
>> It
>>>>> also takes longer to fail.
>>>>>
>>>>> Another way to do this is
>>>>> this:
>>>>>
>>>>> (1) check Cinder quota,
>> if
>>>>> successful, check Nova
>>>>> quota, if successful
>>>>> proceed to (2) else error
>>>>> out
>>>>> (2) create cinder volume
>>>>> (3) create nova instance
>>>>> with cinder attachment.
>>>>>
>>>>> I'm trying to get to this
>>>>> latter form of doing
>> things.
>>>>> Easy, you might say ...
>>>>> theoretically this should
>>>>> simply be:
>>>>>
>>>>> BEGIN;
>>>>> -- Get data to do the
>> Cinder
>>>>> check
>>>>>
>>>>> SELECT ......
>>>>>
>>>>> -- Do the cinder check
>>>>>
>>>>> INSERT INTO ....
>>>>>
>>>>> -- Get data to do the
>> Nova
>>>>> check
>>>>>
>>>>> SELECT ....
>>>>>
>>>>> -- Do the Nova check
>>>>>
>>>>> INSERT INTO ...
>>>>>
>>>>> COMMIT
>>>>>
>>>>> You can only make this
>> work
>>>>> if you ran at isolation
>>>>> level serializable.
>>>>>
>>>>>
>>>>> Why?
>>>>>
>>>>>
>>>>> To make this run at
>>>>> isolation level
>>>>> REPEATABLE-READ, you must
>>>>> enforce
>>>>>
>>>>>
>>>>>
>>>>> constraints at the database level
>>>>> that will fail the commit. But
>> wait,
>>>>> you
>>>>> can't do that because the data
>> about
>>>>> the global limits may not be in
>> the
>>>>> same database as the usage
>> records.
>>>>> Later you talk about caching and
>>>>> stuff; all that doesn't help a
>>>>> database constraint.
>>>>>
>>>>> For this reason, I think
>>>>> there is going to have to
>> be
>>>>> some cognizance to
>>>>>
>>>>>
>>>>>
>>>>> the database isolation level in
>> the
>>>>> design of the library, and I
>> think
>>>>> it
>>>>> will also impact the API that can
>> be
>>>>> constructed.
>>>>>
>>>>> In
>> general
>> therefore, I
>>>>> believe
>> that
>>>>> the
>>>>>
>> hypothetical
>>>>> service
>>>>>
>> processing
>>>>> requests
>> for
>>>>> widgets
>>>>> would
>> have
>>>>> to handle
>>>>> three
>> kinds
>>>>> of
>>>>>
>> operations,
>> provision,
>>>>> modify,
>> and
>>>>> destroy.
>> The
>>>>> names
>> are, I
>>>>> believe,
>>>>>
>> self-explanatory.
>>>>>
>>>>> Generally,
>>>>> modification of a
>>>>> resource doesn't
>>>>> come into play.
>> The
>>>>> primary exception
>> to
>>>>> this is for
>>>>> transferring of
>>>>> ownership of some
>>>>>
>>>>>
>>>>> resource.
>>>>>
>>>>>
>>>>> [amrith] Trove RESIZE is
>> a
>>>>> huge benefit for users
>> and
>>>>> while it may be a
>>>>>
>>>>>
>>>>>
>>>>> pain as you say, this is still a
>>>>> very real benefit. Trove allows
>> you
>>>>> to
>>>>> resize both your storage (resize
>> the
>>>>> cinder volume) and resize your
>>>>> instance (change the flavor).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Without
>> loss
>>>>> of
>>>>>
>> generality,
>>>>> one can
>> say
>>>>> that all
>>>>> three of
>>>>> them must
>>>>> validate
>>>>> that the
>>>>> operation
>>>>> does not
>>>>> violate
>> some
>>>>> limit (no
>>>>> more
>>>>> than X
>>>>> widgets,
>> no
>>>>> fewer
>> than X
>>>>> widgets,
>>>>> rates,
>> and
>>>>> so on).
>>>>>
>>>>>
>>>>> No, only the
>>>>> creation (and
>> very
>>>>> rarely the
>>>>> modification)
>> needs
>>>>> any
>>>>> validation that a
>>>>> limit could been
>>>>> violated.
>> Destroying
>>>>> a resource
>>>>> never needs to be
>>>>> checked for limit
>>>>> violations.
>>>>>
>>>>>
>>>>> [amrith] Well, if you are
>>>>> going to create a volume
>> of
>>>>> 10GB and your
>>>>>
>>>>>
>>>>>
>>>>> limit is 100GB, resizing it to
>> 200GB
>>>>> should fail, I think.
>>>>>
>>>>>
>>>>> Assuming
>>>>> that the
>>>>> service
>>>>>
>> provisions
>>>>> resources
>>>>> from
>> other
>>>>> services,
>>>>> it is
>> also
>> conceivable
>>>>> that
>> limits
>>>>> be
>> imposed
>>>>> on the
>>>>> quantum
>> of
>>>>> those
>>>>> services
>>>>> consumed.
>> In
>>>>> practice,
>> I
>>>>> can
>> imagine
>>>>> a service
>>>>> like
>>>>> Trove
>> using
>>>>> the
>>>>> Delimiter
>>>>> project
>> to
>>>>> perform
>> all
>>>>> of these
>>>>> kinds of
>>>>> limit
>>>>> checks;
>> I'm
>>>>> not
>>>>>
>> suggesting
>>>>> that it
>> does
>>>>> this
>> today,
>>>>> nor that
>>>>> there is
>> an
>>>>> immediate
>>>>> plan to
>>>>> implement
>>>>> all of
>> them,
>>>>> just that
>>>>> these
>>>>> all seem
>>>>> like good
>>>>> uses a
>> Quota
>> Management
>> capability.
>>>>> - User
>> may
>>>>> not have
>>>>> more than
>> 25
>>>>> database
>>>>> instances
>> at
>>>>> a
>>>>>
>>>>>
>>>>> time
>>>>>
>>>>>
>>>>>
>> -
>>>>> User may
>> not
>>>>> have more
>>>>> than 4
>>>>> clusters
>> at
>>>>> a time
>>>>> - User
>> may
>>>>> not
>> consume
>>>>> more than
>>>>> 3TB of
>> SSD
>>>>> storage
>> at a
>>>>> time
>>>>>
>>>>>
>>>>> Only if SSD
>> storage
>>>>> is a distinct
>>>>> resource class
>> from
>>>>> DISK_GB. Right
>>>>> now, Nova makes
>> no
>>>>> differentiation
>>>>> w.r.t. SSD or HDD
>> or
>>>>> shared vs.
>>>>> local block
>> storage.
>>>>>
>>>>> [amrith] It matters not
>> to
>>>>> Trove whether Nova does
>> nor
>>>>> not. Cinder
>>>>>
>>>>>
>>>>>
>>>>> supports volume-types and users
>> DO
>>>>> want to limit based on
>> volume-type
>>>>> (for
>>>>> example).
>>>>>
>>>>>
>> -
>>>>> User may
>> not
>>>>> launch
>> more
>>>>> than 10
>> huge
>>>>> instances
>> at
>>>>> a
>>>>> time
>>>>>
>>>>>
>>>>> What is the point
>> of
>>>>> such a limit?
>>>>>
>>>>>
>>>>>
>>>>> [amrith] Metering usage,
>>>>> placing limitations on
>> the
>>>>> quantum of resources
>>>>>
>>>>>
>>>>>
>>>>> that a user may provision. Same
>> as
>>>>> with Nova. A flavor is merely a
>>>>> simple
>>>>> way to tie together a bag of
>>>>> resources. It is a way to
>> restrict
>>>>> access,
>>>>> for example, to specific
>> resources
>>>>> that are available in the cloud.
>>>>> HUGE
>>>>> is just an example I gave, pick
>> any
>>>>> flavor you want, and here's how a
>>>>> service like Trove uses it.
>>>>>
>>>>> Users can ask to launch
>> an
>>>>> instance of a specific
>>>>> database+version;
>>>>>
>>>>>
>>>>>
>>>>> MySQL 5.6-48 for example. Now, an
>>>>> operator can restrict the
>> instance
>>>>> flavors, or volume types that can
>> be
>>>>> associated with the specific
>>>>> datastore. And the flavor could
>> be
>>>>> used to map to, for example
>> whether
>>>>> the
>>>>> instance is running on bare metal
>> or
>>>>> in a VM and if so with what kind
>> of
>>>>> hardware. That's a useful
>> construct
>>>>> for a service like Trove.
>>>>>
>>>>>
>> -
>>>>> User may
>> not
>>>>> launch
>> more
>>>>> than 3
>>>>> clusters
>> an
>>>>> hour
>>>>>
>>>>>
>>>>>
>>>>> -1. This is rate
>>>>> limiting and
>> should
>>>>> be handled by
>>>>> rate-limiting
>>>>>
>>>>>
>>>>>
>>>>> services.
>>>>>
>>>>>
>>>>>
>> -
>>>>> No more
>> than
>>>>> 500
>> copies
>>>>> of Oracle
>>>>> may be
>> run
>>>>> at a time
>>>>>
>>>>>
>>>>>
>>>>> Is "Oracle" a
>>>>> resource class?
>>>>>
>>>>>
>>>>>
>>>>> [amrith] As I view it,
>> every
>>>>> project should be free to
>>>>> define its own
>>>>>
>>>>>
>>>>>
>>>>> set of resource classes and meter
>>>>> them as it feels fit. So, while
>>>>> Oracle
>>>>> licenses may not, conceivably a
>> lot
>>>>> of things that Nova, Cinder, and
>> the
>>>>> other core projects don't care
>>>>> about, are in fact relevant for a
>>>>> consumer
>>>>> of this library.
>>>>>
>>>>> While
>> Nova
>>>>> would be
>> the
>>>>> service
>> that
>>>>> limits
>> the
>>>>> number of
>>>>> instances
>>>>> a user
>> can
>>>>> have at a
>>>>> time, the
>>>>> ability
>> for
>>>>> a service
>> to
>>>>> limit
>> this
>>>>> further
>>>>> should
>> not
>>>>> be
>>>>>
>> underestimated.
>>>>> In turn,
>>>>> should
>> Nova
>>>>> and
>> Cinder
>>>>> also use
>> the
>>>>> same
>> Quota
>> Management
>>>>> Library,
>>>>> they may
>>>>> each
>> impose
>> limitations
>>>>> like:
>>>>>
>>>>> - User
>> may
>>>>> not
>> launch
>>>>> more than
>> 20
>>>>> huge
>>>>> instances
>> at
>>>>> a
>>>>> time
>>>>>
>>>>>
>>>>> Not a useful
>>>>> limitation IMHO.
>>>>>
>>>>>
>>>>>
>>>>> [amrith] I beg to differ.
>>>>> Again a huge instance is
>>>>> just an example of
>>>>>
>>>>>
>>>>>
>>>>> some flavor; and the idea is to
>>>>> allow a project to place its own
>>>>> metrics
>>>>> and meter based on those.
>>>>>
>>>>>
>> -
>>>>> User may
>> not
>>>>> launch
>> more
>>>>> than 3
>>>>> instances
>> in
>>>>> a minute
>>>>>
>>>>>
>>>>>
>>>>> -1. This is rate
>>>>> limiting.
>>>>>
>>>>>
>>>>>
>> -
>>>>> User may
>> not
>>>>> consume
>> more
>>>>> than 15TB
>> of
>>>>> SSD at a
>>>>> time
>>>>> - User
>> may
>>>>> not have
>>>>> more than
>> 30
>>>>> volumes
>> at a
>>>>> time
>>>>>
>>>>> Again,
>> I'm
>>>>> not
>> implying
>>>>> that
>> either
>>>>> Nova or
>>>>> Cinder
>>>>> should
>>>>> provide
>>>>> these
>>>>>
>> capabilities.
>>>>> With this
>> in
>>>>> mind, I
>>>>> believe
>> that
>>>>> the
>> minimal
>>>>> set of
>>>>>
>> operations
>>>>> that
>>>>> Delimiter
>>>>> should
>>>>> provide
>> are:
>>>>> -
>>>>>
>> define_resource(name, max, min, user_max, user_min, ...)
>>>>>
>>>>> What would the
>> above
>>>>> do? What service
>>>>> would it be
>> speaking
>>>>> to?
>>>>>
>>>>>
>>>>>
>>>>> [amrith] I assume that
>> this
>>>>> would speak with some
>>>>> backend (either
>>>>>
>>>>>
>>>>>
>>>>> keystone or the project itself)
>> and
>>>>> record these designated limits.
>> This
>>>>> is the way to register a project
>>>>> specific metric like "Oracle
>>>>> licenses".
>>>>>
>>>>>
>> -
>> update_resource_limits(name, user, user_max, user_min,
>>>>> ...)
>>>>>
>>>>>
>>>>> This doesn't
>> belong
>>>>> in a quota
>> library.
>>>>> It belongs as a
>> REST
>>>>> API in
>>>>> Keystone.
>>>>>
>>>>>
>>>>> [amrith] Fine, same place
>>>>> where the previous thing
>>>>> stores the global
>>>>>
>>>>>
>>>>>
>>>>> defaults is the target of this
>> call.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> -
>> reserve_resource(name, user, size, parent_resource, ...)
>>>>>
>>>>>
>>>>> This doesn't
>> belong
>>>>> in a quota
>> library
>>>>> at all. I think
>>>>> reservations
>>>>> are not germane
>> to
>>>>> resource
>> consumption
>>>>> and should be
>>>>> handled by an
>>>>> external service
>> at
>>>>> the orchestration
>>>>> layer.
>>>>>
>>>>>
>>>>> [amrith] Again not true,
>> as
>>>>> illustrated above this
>>>>> library is the thing
>>>>>
>>>>>
>>>>>
>>>>> that projects could use to
>> determine
>>>>> whether or not to honor a
>> request.
>>>>> This reserve/provision process
>> is, I
>>>>> believe required because of the
>>>>> vagaries of how we want to
>> implement
>>>>> this in the database.
>>>>>
>>>>>
>> -
>> provision_resource(resource, id)
>>>>>
>>>>>
>>>>> A quota library
>>>>> should not be
>>>>> provisioning
>>>>> anything. A quota
>>>>> library
>>>>> should simply
>>>>> provide a
>> consistent
>>>>> interface for
>>>>> *checking* that a
>>>>> structured
>> request
>>>>> for some set of
>>>>> resources *can*
>> be
>>>>> provided by the
>>>>> service.
>>>>>
>>>>>
>>>>> [amrith] This does not
>>>>> actually call Nova or
>>>>> anything; merely that
>> AFTER
>>>>>
>>>>>
>>>>> the hypothetical service has
>> called
>>>>> NOVA, this converts the
>> reservation
>>>>> (which can expire) into an actual
>>>>> allocation.
>>>>>
>>>>>
>> -
>> update_resource(id or resource, newsize)
>>>>>
>>>>>
>>>>> Resizing
>> resources
>>>>> is a bad idea,
>> IMHO.
>>>>> Resources are
>> easier
>>>>> to deal
>>>>> with when they
>> are
>>>>> considered of
>>>>> immutable size
>> and
>>>>> simple (i.e. not
>>>>> complex or
>> nested).
>>>>> I think the
>> problem
>>>>> here is in the
>>>>> definition of
>>>>> resource classes
>>>>> improperly.
>>>>>
>>>>>
>>>>> [amrith] Let's leave the
>>>>> quota library aside. This
>>>>> assertion strikes at
>>>>>
>>>>>
>>>>>
>>>>> the very heart of things like
>> Nova
>>>>> resize, or for that matter Cinder
>>>>> volume resize. Are those all bad
>>>>> ideas? I made a 500GB Cinder
>> volume
>>>>> and
>>>>> it is getting close to full. I'd
>>>>> like to resize it to 2TB. Are you
>>>>> saying
>>>>> that's not a valid use case?
>>>>>
>>>>> For example, a
>>>>> "cluster" is not
>> a
>>>>> resource. It is a
>>>>> collection of
>>>>> resources of type
>>>>> node. "Resizing"
>> a
>>>>> cluster is a
>>>>> misnomer, because
>>>>> you aren't
>> resizing
>>>>> a resource at
>> all.
>>>>> Instead, you are
>>>>> creating or
>>>>> destroying
>> resources
>>>>> inside the
>> cluster
>>>>> (i.e. joining or
>>>>> leaving
>>>>>
>>>>>
>>>>> cluster nodes).
>>>>>
>>>>>
>>>>> BTW, this is also
>>>>> why the "resize
>>>>> instance" API in
>>>>> Nova is such a
>>>>> giant pain in the
>>>>> ass. It's
>> attempting
>>>>> to "modify" the
>>>>> instance
>>>>>
>>>>>
>>>>> "resource"
>>>>>
>>>>>
>>>>> when the instance
>>>>> isn't really the
>>>>> resource at all.
>> The
>>>>> VCPU, RAM_MB,
>>>>> DISK_GB, and PCI
>>>>> devices are the
>>>>> actual resources.
>>>>> The instance is a
>>>>> convenient way to
>>>>> tie those
>> resources
>>>>> together, and
>> doing
>>>>> a "resize"
>>>>> of the instance
>>>>> behind the scenes
>>>>> actually performs
>> a
>>>>> *move*
>>>>> operation, which
>>>>> isn't a *change*
>> of
>>>>> the original
>>>>> resources.
>> Rather,
>>>>> it is a creation
>> of
>>>>> a new set of
>>>>> resources (of the
>>>>> new amounts) and
>> a
>>>>> deletion of the
>> old
>>>>> set of resources.
>>>>>
>>>>>
>>>>> [amrith] that's fine, if
>> all
>>>>> we want is to handle the
>>>>> resize operation
>>>>>
>>>>>
>>>>>
>>>>> as a new instance followed by a
>>>>> deletion, that's great. But that
>>>>> semantic
>>>>> isn't necessarily the case for
>>>>> something like (say) cinder.
>>>>>
>>>>> The "resize" API
>>>>> call adds some
>> nasty
>>>>> confirmation and
>>>>> cancel
>>>>> semantics to the
>>>>> calling interface
>>>>> that hint that
>> the
>>>>> underlying
>>>>> implementation of
>>>>> the "resize"
>>>>> operation is in
>>>>> actuality not a
>>>>> resize
>>>>> at all, but
>> rather a
>> create-new-and-delete-old-resources operation.
>>>>>
>>>>> [amrith] And that isn't
>>>>> germane to a quota
>> library,
>>>>> I don't think. What
>>>>>
>>>>>
>>>>>
>>>>> is, is this. Do we want to treat
>> the
>>>>> transient state when there are
>> (for
>>>>> example of Nova) two instances,
>> one
>>>>> of the new flavor and one of the
>> old
>>>>> flavor, or not. But, from the
>>>>> perspective of a quota library, a
>>>>> resize
>>>>> operation is merely a reset of
>> the
>>>>> quota by the delta in the
>> resource
>>>>> consumed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> -
>> release_resource(id or resource)
>>>>> -
>>>>>
>> expire_reservations()
>>>>>
>>>>> I see no need to
>>>>> have reservations
>> in
>>>>> the quota library
>> at
>>>>> all, as
>>>>> mentioned above.
>>>>>
>>>>>
>>>>> [amrith] Then I think the
>>>>> quota library must
>> require
>>>>> that either (a) the
>>>>>
>>>>>
>>>>>
>>>>> underlying database runs
>>>>> serializable or (b) database
>>>>> constraints can be
>>>>> used to enforce that at commit
>> the
>>>>> global limits are adhered to.
>>>>>
>>>>> As for your
>> proposed
>>>>> interface and
>>>>> calling structure
>>>>> below, I think a
>>>>> much simpler
>>>>> proposal would
>> work
>>>>> better. I'll work
>> on
>>>>> a cross-project
>>>>> spec that
>> describes
>>>>> this simpler
>>>>> proposal, but the
>>>>> basics would be:
>>>>>
>>>>> 1) Have Keystone
>>>>> store quota
>>>>> information for
>>>>> defaults (per
>>>>> service
>>>>> endpoint), for
>>>>> tenants and for
>>>>> users.
>>>>>
>>>>> Keystone would
>> have
>>>>> the set of
>> canonical
>>>>> resource class
>>>>> names, and
>>>>> each project,
>> upon
>>>>> handling a new
>>>>> resource class,
>>>>> would be
>>>>> responsible for a
>>>>> change submitted
>> to
>>>>> Keystone to add
>> the
>>>>> new resource
>>>>>
>>>>>
>>>>> class code.
>>>>>
>>>>>
>>>>> Straw man REST
>> API:
>>>>>
>> GET /quotas/resource-classes
>>>>> 200 OK
>>>>> {
>>>>>
>> "resource_classes":
>>>>> {
>>>>> "compute.vcpu": {
>>>>> "service":
>>>>> "compute",
>>>>> "code":
>>>>> "compute.vcpu",
>>>>> "description": "A
>>>>> virtual CPU unit"
>>>>> },
>>>>> "compute.ram_mb":
>> {
>>>>> "service":
>>>>> "compute",
>>>>> "code":
>>>>> "compute.ram_mb",
>>>>> "description":
>>>>> "Memory in
>>>>> megabytes"
>>>>> },
>>>>> ...
>>>>> "volume.disk_gb":
>> {
>>>>> "service":
>> "volume",
>>>>> "code":
>>>>> "volume.disk_gb",
>>>>> "description":
>>>>> "Amount of disk
>>>>> space in
>> gigabytes"
>>>>> },
>>>>> ...
>>>>> "database.count":
>> {
>>>>> "service":
>>>>> "database",
>>>>> "code":
>>>>> "database.count",
>>>>> "description":
>>>>> "Number of
>> database
>>>>> instances"
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> [amrith] Well, a user is
>>>>> allowed to have a certain
>>>>> compute quota (which
>>>>>
>>>>>
>>>>>
>>>>> is shared by Nova and Trove) but
>>>>> also a Trove quota. How would
>> your
>>>>> representation represent that?
>>>>>
>>>>> # Get the default
>>>>> limits for new
>>>>> users...
>>>>>
>> GET /quotas/defaults
>>>>> 200 OK
>>>>> {
>>>>> "quotas": {
>>>>> "compute.vcpu":
>> 100,
>>>>> "compute.ram_mb":
>>>>> 32768,
>>>>> "volume.disk_gb":
>>>>> 1000,
>>>>> "database.count":
>> 25
>>>>> }
>>>>> }
>>>>>
>>>>> # Get a specific
>>>>> user's limits...
>>>>>
>> GET /quotas/users/{UUID}
>>>>> 200 OK
>>>>> {
>>>>> "quotas": {
>>>>> "compute.vcpu":
>> 100,
>>>>> "compute.ram_mb":
>>>>> 32768,
>>>>> "volume.disk_gb":
>>>>> 1000,
>>>>> "database.count":
>> 25
>>>>> }
>>>>> }
>>>>>
>>>>> # Get a tenant's
>>>>> limits...
>>>>>
>> GET /quotas/tenants/{UUID}
>>>>> 200 OK
>>>>> {
>>>>> "quotas": {
>>>>> "compute.vcpu":
>>>>> 1000,
>>>>> "compute.ram_mb":
>>>>> 327680,
>>>>> "volume.disk_gb":
>>>>> 10000,
>>>>> "database.count":
>>>>> 250
>>>>> }
>>>>> }
>>>>>
>>>>> 2) Have Delimiter
>>>>> communicate with
>> the
>>>>> above proposed
>> new
>>>>> Keystone
>>>>> REST API and
>> package
>>>>> up data into an
>>>>>
>> oslo.versioned_objects interface.
>>>>> Clearly all of
>> the
>>>>> above can be
>> heavily
>>>>> cached both on
>> the
>>>>> server and
>>>>> client side since
>>>>> they rarely
>> change
>>>>> but are read
>> often.
>>>>>
>>>>> [amrith] Caching on the
>>>>> client won't save you
>> from
>>>>> oversubscription if
>>>>>
>>>>>
>>>>>
>>>>> you don't run serializable.
>>>>>
>>>>>
>>>>> The Delimiter
>>>>> library could be
>>>>> used to provide a
>>>>> calling interface
>>>>> for service
>> projects
>>>>> to get a user's
>>>>> limits for a set
>> of
>>>>> resource
>>>>>
>>>>>
>>>>> classes:
>>>>>
>>>>>
>>>>> (please excuse
>>>>> wrongness, typos,
>>>>> and other stuff
>>>>> below, it's just
>> a
>>>>> straw- man not
>>>>> production
>> working
>>>>> code...)
>>>>>
>>>>> # file:
>>>>>
>> delimiter/objects/limits.py
>>>>> import
>>>>>
>> oslo.versioned_objects.base as ovo import
>> oslo.versioned_objects.fields as ovo_fields
>>>>>
>>>>> class
>>>>>
>> ResourceLimit(ovo.VersionedObjectBase):
>>>>> # 1.0: Initial
>>>>> version
>>>>> VERSION = '1.0'
>>>>>
>>>>> fields = {
>>>>> 'resource_class':
>>>>>
>> ovo_fields.StringField(),
>>>>> 'amount':
>>>>>
>> ovo_fields.IntegerField(),
>>>>> }
>>>>>
>>>>>
>>>>> class
>>>>>
>> ResourceLimitList(ovo.VersionedObjectBase):
>>>>> # 1.0: Initial
>>>>> version
>>>>> VERSION = '1.0'
>>>>>
>>>>> fields = {
>>>>> 'resources':
>>>>>
>> ListOfObjectsField(ResourceLimit),
>>>>> }
>>>>>
>>>>>
>> @cache_this_heavily
>> @remotable_classmethod
>>>>> def
>>>>>
>> get_all_by_user(cls,
>>>>> user_uuid):
>>>>> """Returns a
>> Limits
>>>>> object that tells
>>>>> the caller what a
>>>>> user's
>>>>> absolute limits
>> for
>>>>> the set of
>> resource
>>>>> classes in the
>>>>> system.
>>>>> """
>>>>> # Grab a keystone
>>>>> client session
>>>>> object and
>> connect
>>>>> to Keystone
>>>>> ks =
>>>>>
>> ksclient.Session(...)
>>>>> raw_limits =
>>>>>
>> ksclient.get_limits_by_user()
>>>>> return
>>>>>
>> cls(resources=[ResourceLimit(**d) for d in raw_limits])
>>>>> 3) Each service
>>>>> project would be
>>>>> responsible for
>>>>> handling the
>>>>> consumption of a
>> set
>>>>> of requested
>>>>> resource amounts
>> in
>>>>> an atomic and
>>>>>
>>>>>
>>>>> consistent way.
>>>>>
>>>>>
>>>>> [amrith] This is where
>> the
>>>>> rubber meets the road.
>> What
>>>>> is that atomic
>>>>>
>>>>>
>>>>>
>>>>> and consistent way? And what
>>>>> computing infrastructure do you
>> need
>>>>> to
>>>>> deliver this?
>>>>>
>>>>> The Delimiter
>>>>> library would
>> return
>>>>> the limits that
>> the
>>>>> service would
>>>>> pre- check before
>>>>> claiming the
>>>>> resources and
>> either
>>>>> post-check after
>>>>> claim or utilize
>> a
>> compare-and-update
>>>>> technique with a
>>>>>
>> generation/timestamp
>>>>> during claiming
>> to
>>>>> prevent race
>>>>> conditions.
>>>>>
>>>>> For instance, in
>>>>> Nova with the new
>>>>> resource
>> providers
>>>>> database schema
>>>>> and doing claims
>> in
>>>>> the scheduler (a
>>>>> proposed change),
>> we
>>>>> might do
>>>>> something to the
>>>>> effect of:
>>>>>
>>>>> from delimiter
>>>>> import objects as
>>>>> delim_obj from
>>>>> delimier import
>>>>> exceptions as
>>>>> delim_exc from
>> nova
>>>>> import objects as
>>>>> nova_obj
>>>>>
>>>>> request =
>>>>>
>> nova_obj.RequestSpec.get_by_uuid(request_uuid)
>>>>> requested =
>>>>> request.resources
>>>>> limits =
>>>>>
>> delim_obj.ResourceLimitList.get_all_by_user(user_uuid)
>>>>> allocations =
>>>>>
>> nova_obj.AllocationList.get_all_by_user(user_uuid)
>>>>> # Pre-check for
>>>>> violations
>>>>> for
>> resource_class,
>>>>> requested_amount
>> in
>> requested.items():
>>>>> limit_idx =
>>>>>
>> limits.resources.index(resource_class)
>>>>> resource_limit =
>>>>>
>> limits.resources[limit_idx].amount
>>>>> alloc_idx =
>>>>>
>> allocations.resources.index(resource_class)
>>>>> resource_used =
>>>>>
>> allocations.resources[alloc_idx]
>>>>> if (resource_used
>> +
>> requested_amount)>
>>>>> resource_limit:
>>>>> raise
>>>>>
>> delim_exc.QuotaExceeded
>>>>>
>>>>> [amrith] Is the above
>> code
>>>>> run with some global
>> mutex
>>>>> to prevent that
>>>>>
>>>>>
>>>>>
>>>>> two people don't believe that
>> they
>>>>> are good on quota at the same
>> time?
>>>>>
>>>>> # Do claims in
>>>>> scheduler in an
>>>>> atomic,
>> consistent
>>>>> fashion...
>>>>> claims =
>>>>>
>> scheduler_client.claim_resources(request)
>>>>>
>>>>> [amrith] Yes, each
>> 'atomic'
>>>>> claim on a
>> repeatable-read
>>>>> database could
>>>>>
>>>>>
>>>>>
>>>>> result in oversubscription.
>>>>>
>>>>>
>>>>> # Post-check for
>>>>> violations
>>>>> allocations =
>>>>>
>> nova_obj.AllocationList.get_all_by_user(user_uuid)
>>>>> # allocations now
>>>>> include the
>> claimed
>>>>> resources from
>> the
>>>>> scheduler
>>>>>
>>>>> for
>> resource_class,
>>>>> requested_amount
>> in
>> requested.items():
>>>>> limit_idx =
>>>>>
>> limits.resources.index(resource_class)
>>>>> resource_limit =
>>>>>
>> limits.resources[limit_idx].amount
>>>>> alloc_idx =
>>>>>
>> allocations.resources.index(resource_class)
>>>>> resource_used =
>>>>>
>> allocations.resources[alloc_idx]
>>>>> if resource_used>
>>>>> resource_limit:
>>>>> # Delete the
>>>>> allocation
>> records
>>>>> for the resources
>>>>> just claimed
>>>>>
>> delete_resources(claims)
>>>>> raise
>>>>>
>> delim_exc.QuotaExceeded
>>>>>
>>>>> [amrith] Again, two
>> people
>>>>> could drive through this
>>>>> code and both of
>>>>> them could fail :(
>>>>>
>>>>> 4) The only other
>>>>> thing that would
>>>>> need to be done
>> for
>>>>> a first go of
>>>>> the Delimiter
>>>>> library is some
>>>>> event listener
>> that
>>>>> can listen for
>>>>> changes to the
>> quota
>>>>> limits for a
>>>>>
>> user/tenant/default
>>>>> in Keystone.
>>>>> We'd want the
>>>>> services to be
>> able
>>>>> notify someone if
>> a
>>>>> reduction in
>>>>> quota results in
>> an
>>>>> overquota
>> situation.
>>>>> Anyway, that's my
>>>>> idea. Keep the
>>>>> Delimiter library
>>>>> small and focused
>>>>> on describing the
>>>>> limits only, not
>> on
>>>>> the resource
>>>>> allocations. Have
>>>>> the Delimiter
>>>>> library present a
>>>>> versioned object
>>>>> interface so the
>>>>> interaction
>> between
>>>>> the data exposed
>> by
>>>>> the Keystone REST
>>>>> API for
>>>>> quotas can evolve
>>>>> naturally and
>>>>> smoothly over
>> time.
>>>>> Best,
>>>>> -jay
>>>>>
>>>>> Let me
>>>>>
>> illustrate
>>>>> the way I
>>>>> see these
>>>>> things
>>>>> fitting
>>>>> together.
>> A
>> hypothetical
>>>>> Trove
>> system
>>>>> may be
>> setup
>>>>> as
>> follows:
>>>>> - No more
>>>>> than 2000
>>>>> database
>>>>> instances
>> in
>>>>> total,
>> 300
>>>>> clusters
>>>>>
>>>>>
>>>>> in
>>>>>
>>>>>
>>>>>
>>>>> total
>>>>> - Users
>> may
>>>>> not
>> launch
>>>>> more than
>> 25
>>>>> database
>>>>>
>> instances,
>>>>> or 4
>>>>> clusters
>>>>> - The
>>>>>
>> particular
>>>>> user
>>>>> 'amrith'
>> is
>>>>> limited
>> to 2
>>>>> databases
>>>>> and
>>>>>
>>>>>
>>>>> 1
>>>>>
>>>>>
>>>>>
>>>>> cluster
>>>>> - No user
>>>>> may
>> consume
>>>>> more than
>>>>> 20TB of
>>>>> storage
>> at a
>>>>> time
>>>>> - No user
>>>>> may
>> consume
>>>>> more than
>>>>> 10GB of
>>>>> memory at
>> a
>>>>> time
>>>>>
>>>>> At
>> startup,
>>>>> I believe
>>>>> that the
>>>>> system
>> would
>>>>> make the
>>>>> following
>>>>> sequence
>> of
>>>>> calls:
>>>>>
>>>>> -
>>>>>
>> define_resource(databaseInstance, 2000, 0, 25, 0, ...)
>>>>> -
>>>>>
>> update_resource_limits(databaseInstance, amrith, 2, 0,
>>>>>
>>>>> ...)
>>>>>
>>>>>
>>>>>
>> -
>> define_resource(databaseCluster, 300, 0, 4, 0, ...)
>>>>> -
>>>>>
>> update_resource_limits(databaseCluster, amrith, 1, 0, ...)
>>>>> -
>>>>>
>> define_resource(storage, -1, 0, 20TB, 0, ...)
>>>>> -
>>>>>
>> define_resource(memory, -1, 0, 10GB, 0, ...)
>>>>> Assume
>> that
>>>>> the user
>>>>> john
>> comes
>>>>> along and
>>>>> asks for
>> a
>>>>> cluster
>> with
>>>>> 4
>>>>> nodes,
>> 1TB
>>>>> storage
>> per
>>>>> node and
>>>>> each node
>>>>> having
>> 1GB
>>>>> of
>> memory,
>>>>> the
>>>>> system
>> would
>>>>> go
>> through
>>>>> the
>>>>> following
>>>>> sequence:
>>>>>
>>>>> -
>>>>>
>> reserve_resource(databaseCluster, john, 1, None)
>>>>> o this
>>>>> returns a
>>>>>
>> resourceID
>>>>> (say
>>>>>
>> cluster-resource-
>>>>>
>>>>> ID)
>>>>>
>>>>>
>>>>>
>>>>> o
>> the
>>>>> cluster
>>>>> instance
>>>>> that it
>>>>> reserves
>>>>> counts
>>>>>
>>>>>
>>>>>
>>>>> against
>>>>>
>>>>>
>>>>>
>>>>>
>> the
>>>>> limit of
>> 300
>>>>> cluster
>>>>> instances
>> in
>>>>> total, as
>>>>> well
>>>>>
>>>>>
>>>>>
>>>>> as
>>>>>
>>>>>
>>>>>
>>>>>
>> the 4
>>>>> clusters
>>>>> that john
>>>>> can
>>>>>
>> provision.
>>>>> If
>> 'amrith'
>>>>>
>>>>>
>>>>> had
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> requested
>>>>> it, that
>>>>> would
>> have
>>>>> been
>> counted
>>>>> against
>>>>>
>>>>>
>>>>>
>>>>> the
>>>>>
>>>>>
>>>>>
>>>>>
>> limit
>>>>> of 2
>>>>> clusters
>> for
>>>>> the user.
>>>>>
>>>>> -
>>>>>
>> reserve_resource(databaseInstance, john, 1,
>> cluster-resource-id)
>>>>> -
>>>>>
>> reserve_resource(databaseInstance, john, 1,
>> cluster-resource-id)
>>>>> -
>>>>>
>> reserve_resource(databaseInstance, john, 1,
>> cluster-resource-id)
>>>>> -
>>>>>
>> reserve_resource(databaseInstance, john, 1,
>> cluster-resource-id)
>>>>> o this
>>>>> returns
>> four
>>>>> resource
>>>>> id's,
>> let's
>>>>> say
>>>>>
>> instance-1-id, instance-2-id, instance-3-id,
>> instance-4-id
>>>>> o note
>> that
>>>>> each
>>>>> instance
>> is
>>>>> that, an
>>>>> instance
>> by
>>>>> itself.
>> it
>>>>> is
>> therefore
>>>>> not right
>> to
>>>>> consider
>>>>> this
>>>>>
>>>>>
>>>>> as
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> equivalent
>>>>> to a call
>> to
>> reserve_resource() with a
>>>>>
>>>>>
>>>>> size
>>>>>
>>>>>
>>>>>
>>>>> of
>> 4,
>> especially
>>>>> because
>> each
>>>>> instance
>>>>> could
>> later
>>>>>
>>>>>
>>>>> be
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> tracked
>> as
>>>>> an
>>>>>
>> individual
>>>>> Nova
>>>>> instance.
>>>>>
>>>>> -
>>>>>
>> reserve_resource(storage, john, 1TB, instance-1-id)
>>>>> -
>>>>>
>> reserve_resource(storage, john, 1TB, instance-2-id)
>>>>> -
>>>>>
>> reserve_resource(storage, john, 1TB, instance-3-id)
>>>>> -
>>>>>
>> reserve_resource(storage, john, 1TB, instance-4-id)
>>>>> o each of
>>>>> them
>> returns
>>>>> some
>>>>>
>> resourceID,
>>>>> let's say
>>>>>
>>>>>
>>>>> they
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> returned
>>>>>
>> cinder-1-id,
>> cinder-2-id,
>> cinder-3-id,
>> cinder-4-id
>>>>> o since
>> the
>>>>> storage
>> of
>>>>> 1TB is a
>>>>> unit, it
>> is
>>>>> treated
>>>>>
>>>>>
>>>>> as
>>>>>
>>>>>
>>>>>
>>>>>
>> such.
>>>>> In other
>>>>> words,
>> you
>>>>> don't
>> need
>>>>> to invoke
>>>>>
>> reserve_resource 10^12 times, once per byte
>>>>> allocated
>>>>> :)
>>>>>
>>>>> -
>>>>>
>> reserve_resource(memory, john, 1GB, instance-1-id)
>>>>> -
>>>>>
>> reserve_resource(memory, john, 1GB, instance-2-id)
>>>>> -
>>>>>
>> reserve_resource(memory, john, 1GB, instance-3-id)
>>>>> -
>>>>>
>> reserve_resource(memory, john, 1GB, instance-4-id)
>>>>> o each of
>>>>> these
>> return
>> something,
>>>>> say
>>>>>
>> Dg4KBQcODAENBQEGBAcEDA, CgMJAg8FBQ8GDwgLBA8FAg,
>> BAQJBwYMDwAIAA0DBAkNAg, AQMLDA4OAgEBCQ0MBAMGCA. I
>>>>>
>>>>> have
>>>>>
>>>>>
>>>>>
>>>>>
>> made
>>>>> up
>> arbitrary
>>>>> strings
>> just
>>>>> to
>> highlight
>>>>> that we
>>>>> really
>> don't
>>>>> track
>> these
>>>>> anywhere
>> so
>>>>> we don't
>>>>> care
>>>>>
>>>>>
>>>>> about
>>>>>
>>>>>
>>>>>
>>>>>
>> them.
>>>>> If all
>> this
>>>>> works,
>> then
>>>>> the
>> system
>>>>> knows
>> that
>>>>> John's
>>>>> request
>> does
>>>>> not
>> violate
>>>>> any
>> quotas
>>>>> that it
>> can
>>>>> enforce,
>> it
>>>>> can then
>> go
>>>>> ahead and
>>>>> launch
>> the
>>>>> instances
>>>>> (calling
>>>>> Nova),
>>>>> provision
>>>>> storage,
>> and
>>>>> so on.
>>>>>
>>>>> The
>> system
>>>>> then goes
>>>>> and
>> creates
>>>>> four
>> Cinder
>>>>> volumes,
>>>>> these are
>>>>>
>> cinder-1-uuid, cinder-2-uuid, cinder-3-uuid, cinder-4-uuid.
>>>>> It can
>> then
>>>>> go and
>>>>> confirm
>>>>> those
>>>>>
>> reservations.
>>>>> -
>>>>>
>> provision_resource(cinder-1-id, cinder-1-uuid)
>>>>> -
>>>>>
>> provision_resource(cinder-2-id, cinder-2-uuid)
>>>>> -
>>>>>
>> provision_resource(cinder-3-id, cinder-3-uuid)
>>>>> -
>>>>>
>> provision_resource(cinder-4-id, cinder-4-uuid)
>>>>> It could
>>>>> then go
>> and
>>>>> launch 4
>>>>> nova
>>>>> instances
>>>>> and
>>>>> similarly
>>>>> provision
>>>>> those
>>>>>
>> resources,
>>>>> and so
>> on.
>>>>> This
>> process
>>>>> could
>> take
>>>>> some
>> minutes
>>>>> and
>>>>> holding a
>>>>> database
>>>>>
>> transaction
>>>>> open for
>>>>> this is
>> the
>>>>> issue
>> that
>>>>> Jay
>>>>> brings up
>> in
>>>>> [4]. We
>>>>> don't
>> have
>>>>> to in
>> this
>>>>> proposed
>>>>> scheme.
>>>>>
>>>>> Since the
>>>>> resources
>>>>> are all
>>>>>
>> hierarchically linked through the
>>>>> overall
>>>>> cluster
>> id,
>>>>> when the
>>>>> cluster
>> is
>>>>> setup, it
>>>>> can
>> finally
>>>>> go and
>>>>> provision
>>>>> that:
>>>>>
>>>>> -
>>>>>
>> provision_resource(cluster-resource-id, cluster-uuid)
>>>>> When
>> Trove
>>>>> is done
>> with
>>>>> some
>>>>>
>> individual
>>>>> resource,
>> it
>>>>> can go
>> and
>>>>> release
>> it.
>>>>> Note that
>>>>> I'm
>> thinking
>>>>> this will
>>>>> invoke
>>>>>
>> release_resource
>>>>> with the
>> ID
>>>>> of the
>>>>>
>> underlying
>>>>> object OR
>>>>> the
>>>>> resource.
>>>>>
>>>>> -
>>>>>
>> release_resource(cinder-4-id), and
>>>>> -
>>>>>
>> release_resource(cinder-4-uuid)
>>>>> are
>>>>> therefore
>>>>> identical
>>>>> and
>> indicate
>>>>> that the
>> 4th
>>>>> 1TB
>> volume
>>>>> is now
>>>>> released.
>>>>> How this
>>>>> will be
>>>>>
>> implemented
>>>>> in
>> Python,
>>>>> kwargs or
>>>>> some
>>>>> other
>>>>> mechanism
>>>>> is, I
>>>>> believe,
>> an
>> implementation detail.
>>>>> Finally,
>> it
>>>>> releases
>> the
>>>>> cluster
>>>>> resource
>> by
>>>>> doing
>> this:
>>>>> -
>>>>>
>> release_resource(cluster-resource-id)
>>>>> This
>> would
>>>>> release
>> the
>>>>> cluster
>> and
>>>>> all
>>>>> dependent
>>>>> resources
>> in
>>>>> a
>>>>> single
>>>>>
>> operation.
>>>>> A user
>> may
>>>>> wish to
>>>>> manage a
>>>>> resource
>>>>> that was
>>>>>
>> provisioned
>>>>> from the
>>>>> service.
>>>>> Assume
>> that
>>>>> this
>> results
>>>>> in a
>>>>> resizing
>> of
>>>>> the
>>>>>
>> instances,
>>>>> then it
>> is a
>>>>> matter of
>>>>> updating
>>>>> that
>>>>> resource.
>>>>>
>>>>> Assume
>> that
>>>>> the third
>>>>> 1TB
>> volume
>>>>> is being
>>>>> resized
>> to
>>>>> 2TB, then
>> it
>>>>> is
>>>>> merely a
>>>>> matter of
>>>>> invoking:
>>>>>
>>>>> -
>>>>>
>> update_resource(cinder-3-uuid, 2TB)
>>>>> Delimiter
>>>>> can go
>>>>> figure
>> out
>>>>> that
>>>>>
>> cinder-3-uuid is a 1TB device and
>>>>> therefore
>>>>> this is
>> an
>>>>> increase
>> of
>>>>> 1TB and
>>>>> verify
>> that
>>>>> this is
>>>>> within
>>>>> the
>> quotas
>>>>> allowed
>> for
>>>>> the user.
>>>>>
>>>>> The thing
>>>>> that I
>> find
>> attractive
>>>>> about
>> this
>>>>> model of
>>>>>
>> maintaining
>>>>> a
>>>>> hierarchy
>> of
>> reservations
>>>>> is that
>> in
>>>>> the event
>> of
>>>>> an error,
>>>>> the
>>>>> service
>> need
>>>>> merely
>> call
>> release_resource() on the highest level
>> reservation
>>>>> and the
>>>>> Delimiter
>>>>> project
>> can
>>>>> walk down
>>>>> the chain
>>>>> and
>>>>> release
>> all
>>>>> the
>>>>> resources
>> or
>> reservations
>>>>> as
>>>>>
>> appropriate.
>>>>> Under the
>>>>> covers I
>>>>> believe
>> that
>>>>> each of
>>>>> these
>>>>>
>> operations
>>>>> should be
>>>>> atomic
>> and
>>>>> may
>> update
>>>>> multiple
>>>>> database
>>>>> tables
>> but
>>>>> these
>> will
>>>>> all be
>>>>> short
>> lived
>> operations.
>>>>> For
>> example,
>>>>> reserving
>> an
>>>>> instance
>>>>> resource
>>>>> would
>>>>> increment
>>>>> the
>>>>> number of
>>>>> instances
>>>>> for the
>> user
>>>>> as well
>> as
>>>>> the
>> number
>>>>> of
>> instances
>>>>> on the
>>>>> whole,
>> and
>>>>> this
>> would
>>>>> be an
>> atomic
>> operation.
>>>>> I have
>> two
>>>>> primary
>>>>> areas of
>>>>> concern
>>>>> about the
>>>>> proposal
>>>>> [3].
>>>>>
>>>>> The first
>> is
>>>>> that it
>>>>> makes the
>>>>> implicit
>>>>>
>> assumption
>>>>> that the
>>>>> "flat
>> mode"
>>>>> is
>>>>>
>> implemented.
>>>>> That
>>>>> provides
>>>>> value to
>> a
>>>>>
>>>>> consumer
>>>>>
>>>>>
>>>>>
>>>>> but I
>> think
>>>>> it leaves
>> a
>>>>> lot for
>> the
>>>>> consumer
>> to
>>>>> do. For
>>>>>
>>>>>
>>>>>
>>>>> example,
>>>>>
>>>>>
>>>>>
>> I
>>>>> find it
>> hard
>>>>> to see
>> how
>>>>> the model
>>>>> proposed
>>>>> would
>> handle
>>>>>
>>>>>
>>>>> the
>>>>>
>>>>>
>>>>>
>>>>> release
>> of
>>>>> quotas,
>>>>> leave
>> alone
>>>>> the case
>> of
>>>>> a nested
>>>>> release
>> of
>>>>>
>>>>> a
>>>>>
>>>>>
>>>>>
>>>>>
>> hierarchy
>>>>> of
>>>>>
>> resources.
>>>>> The other
>> is
>>>>> the
>> notion
>>>>> that the
>>>>>
>> implementation will begin a
>> transaction,
>>>>> perform a
>>>>> query(),
>>>>> make some
>>>>>
>> manipulations, and
>>>>> then do a
>>>>> save().
>> This
>>>>> makes for
>> an
>> interesting
>> transaction
>> management
>>>>> challenge
>> as
>>>>> it would
>>>>> require
>> the
>> underlying
>>>>>
>>>>> database
>>>>>
>>>>>
>>>>>
>>>>> to run
>> in
>>>>> an
>> isolation
>>>>> mode of
>> at
>>>>> least
>>>>>
>> repeatable
>>>>> reads and
>>>>> maybe
>> even
>> serializable
>>>>> which
>> would
>>>>> be a
>>>>>
>> performance
>>>>> bear on
>>>>>
>>>>>
>>>>> a
>>>>>
>>>>>
>>>>>
>>>>> heavily
>>>>> loaded
>>>>> system.
>> If
>>>>> run in
>> the
>> traditional
>>>>> read-
>>>>>
>>>>>
>>>>>
>>>>> committed
>>>>>
>>>>>
>>>>>
>>>>> mode,
>> this
>>>>> would
>>>>> silently
>>>>> lead to
>> over
>> subscriptions, and
>>>>>
>>>>>
>>>>> the
>>>>>
>>>>>
>>>>>
>>>>>
>> violation
>>>>> of quota
>>>>> limits.
>>>>>
>>>>> I believe
>>>>> that it
>>>>> should be
>> a
>> requirement
>>>>> that the
>>>>> Delimiter
>>>>> library
>>>>> should be
>>>>> able to
>> run
>>>>> against a
>>>>> database
>>>>> that
>>>>> supports,
>>>>> and is
>>>>>
>> configured
>>>>> for
>>>>>
>> READ-COMMITTED, and should not require anything higher.
>>>>> The model
>>>>> proposed
>>>>> above can
>>>>> certainly
>> be
>> implemented
>>>>> with a
>>>>> database
>>>>> running
>>>>>
>> READ-COMMITTED, and I believe that this is also
>>>>> true with
>>>>> the
>> caveat
>>>>> that the
>>>>>
>> operations
>>>>> will be
>>>>> performed
>>>>> through
>>>>>
>>>>>
>>>>> SQLAlchemy.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -amrith
>>>>>
>>>>> [1]
>>>>>
>> http://openstack.markmail.org/thread/tkl2jcyvzgifniux
>>>>> [2]
>>>>>
>> http://openstack.markmail.org/thread/3cr7hoeqjmgyle2j
>>>>> [3]
>>>>>
>> https://review.openstack.org/#/c/284454/
>>>>> [4]
>>>>>
>> http://markmail.org/message/7ixvezcsj3uyiro6
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> ____________________________________________________________________
>>>>> __ ____
>>>>> OpenStack
>>>>>
>> Development
>>>>> Mailing
>> List
>>>>> (not for
>>>>> usage
>>>>>
>> questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>> _____________________________________________________________________
>>>>> _____ OpenStack
>>>>> Development
>> Mailing
>>>>> List (not for
>> usage
>>>>> questions)
>>>>> Unsubscribe:
>>>>>
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>> ______________________________________________________________________
>>>>> ____ OpenStack
>> Development
>>>>> Mailing List (not for
>> usage
>>>>> questions)
>>>>> Unsubscribe:
>>>>>
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>> __________________________________________________________________________
>>>>> OpenStack Development Mailing
>> List
>>>>> (not for usage questions)
>>>>> Unsubscribe:
>>>>>
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>>
>> __________________________________________________________________________
>>>>> OpenStack Development Mailing List (not
>> for
>>>>> usage questions)
>>>>> Unsubscribe:
>>>>>
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>>
>> __________________________________________________________________________
>>>>> OpenStack Development Mailing List (not for usage
>>>>> questions)
>>>>> Unsubscribe:
>>>>>
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> __________________________________________________________________________
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>>
>>>>> Unsubscribe:
>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list