<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<div>
<div>Ok to beer and high bandwidth. FYI Jay the distributed high perf db we did a couple of years ago is now open source. Just saying. Mysql plug compatible ....</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div id="x_composer_signature">-amrith
<div><br>
</div>
<div>--</div>
<div>Amrith Kumar</div>
<div>amrith@tesora.com</div>
</div>
<br>
<br>
-------- Original message --------<br>
From: Jay Pipes <jaypipes@gmail.com> <br>
Date: 04/23/2016 4:10 PM (GMT-05:00) <br>
To: Amrith Kumar <amrith@tesora.com>, openstack-dev@lists.openstack.org <br>
Cc: vilobhmm@yahoo-inc.com, nik.komawar@gmail.com, Ed Leafe <ed@leafe.com> <br>
Subject: Re: [openstack-dev] More on the topic of DELIMITER, the Quota Management Library proposal
<br>
<br>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Looking forward to arriving in Austin so that I can buy you a beer,
<br>
Amrith, and have a high-bandwidth conversation about how you're wrong. :P<br>
<br>
Comments inline.<br>
<br>
On 04/23/2016 11:25 AM, Amrith Kumar wrote:<br>
> On Sat, 2016-04-23 at 10:26 -0400, Andrew Laski wrote:<br>
>> On Fri, Apr 22, 2016, at 09:57 PM, Tim Bell wrote:<br>
>><br>
>>> I have reservations on f and g.<br>
>>><br>
>>><br>
>>> On f., We have had a number of discussions in the past about<br>
>>> centralising quota (e.g. Boson) and the project teams of the other<br>
>>> components wanted to keep the quota contents ‘close’. This can<br>
>>> always be reviewed further with them but I would hope for at least a<br>
>>> standard schema structure of tables in each project for the handling<br>
>>> of quota.<br>
>>><br>
>>><br>
>>> On g., aren’t all projects now nested projects ? If we have the<br>
>>> complexity of handling nested projects sorted out in the common<br>
>>> library, is there a reason why a project would not want to support<br>
>>> nested projects ?<br>
>>><br>
>>><br>
>>> One other issue is how to do reconcilliation, each project needs to<br>
>>> have a mechanism to re-calculate the current allocations and<br>
>>> reconcile that with the quota usage. While in an ideal world, this<br>
>>> should not be necessary, it would be for the foreseeable future,<br>
>>> especially with a new implementation.<br>
>>><br>
>><br>
>> One of the big reasons that Jay and I have been pushing to remove<br>
>> reservations and tracking of quota in a separate place than the<br>
>> resources are actually used, e.g., an instance record in the Nova db,<br>
>> is so that reconciliation is not necessary. For example, if RAM quota<br>
>> usage is simply tracked as sum(instances.memory_mb) then you can be<br>
>> sure that usage is always up to date.<br>
><br>
> Uh oh, there be gremlins here ...<br>
><br>
> I am positive that this will NOT work, see earlier conversations about<br>
> isolation levels, and Jay's alternate solution.<br>
><br>
> The way (I understand the issue, and Jay's solution) you get around the<br>
> isolation levels trap is to NOT do your quota determinations based on a<br>
> SUM(column) but rather based on the rowcount on a well crafted UPDATE of<br>
> a single table that stored total quota.<br>
<br>
No, we would do our quota calculations by doing a SUM(used) against the <br>
allocations table. There is no separate table that stored the total <br>
quota (or quota usage records). That's the source of the problem with <br>
the existing quota handling code in Nova. The generation field value is <br>
used to provide the consistent view of the actual resource usage records <br>
so that the INSERT operations for all claimed resources can be done in a <br>
transactional manner and will be rolled back if any other writer changes <br>
the amount of consumed resources on a provider (which of course would <br>
affect the quota check calculations).<br>
<br>
> You could also store a detail<br>
> claim record for each claim in an independent table that is maintained<br>
> in the same database transaction if you so desire, that is optional.<br>
<br>
The allocations table is the "detail claim record" table that you refer <br>
to above.<br>
<br>
> My view of how this would work (which I described earlier as building on<br>
> Jay's solution) is that the claim flow would look like this:<br>
><br>
> select total_used, generation<br>
> from quota_claimed<br>
> where tenant = <tenant> and resource = 'memory'<br>
<br>
There is no need to keep a total_used value for anything. That is <br>
denormalized calculated data that merely adds a point of race <br>
contention. The quota check is against the *detail* table (allocations), <br>
which stores the *actual resource usage records*.<br>
<br>
> begin transaction<br>
><br>
> update quota_claimed<br>
> set total_used = total_used + claim, generation =<br>
> generation + 1<br>
> where tenant = <tenant> and resource = 'memory'<br>
> and generation = generation<br>
> and total_used + claim < limit<br>
<br>
This part of the transaction must always occur **after** the insertion <br>
of the actual resource records, not before.<br>
<br>
> if @@rowcount = 1<br>
> -- optional claim_detail table<br>
> insert into claim_detail values ( <tenant>, 'memory',<br>
> claim, ...)<br>
> commit<br>
> else<br>
> rollback<br>
<br>
So, in pseudo-Python-SQLish code, my solution works like this:<br>
<br>
limits = get_limits_from_delimiter()<br>
requested = get_requested_from_request_spec()<br>
<br>
while True:<br>
<br>
used := SELECT<br>
resource_class,<br>
resource_provider,<br>
generation,<br>
SUM(used) as total_used<br>
FROM allocations<br>
JOIN resource_providers ON (...)<br>
WHERE consumer_uuid = $USER_UUID<br>
GROUP BY<br>
resource_class,<br>
resource_provider,<br>
generation;<br>
<br>
# Check that our requested resource amounts don't exceed quotas<br>
if not check_requested_within_limits(requested, used, limits):<br>
raise QuotaExceeded<br>
<br>
# Claim all requested resources. Note that the generation retrieved<br>
# from the above query is our consistent view marker. If the UPDATE<br>
# below succeeds and returns != 0 rows affected, that means there<br>
# was no other writer that changed our resource usage in between<br>
# this thread's claiming of resources, and therefore we prevent<br>
# any oversubscription of resources.<br>
begin_transaction:<br>
<br>
provider := SELECT id, generation, ... FROM resource_providers<br>
JOIN (...)<br>
WHERE (<resource_usage_filters>)<br>
<br>
for resource in requested:<br>
INSERT INTO allocations (<br>
resource_provider_id,<br>
resource_class_id,<br>
consumer_uuid,<br>
used<br>
) VALUES (<br>
$provider.id,<br>
$resource.id,<br>
$USER_UUID,<br>
$resource.amount<br>
);<br>
<br>
rows_affected := UPDATE resource_providers<br>
SET generation = generation + 1<br>
WHERE id = $provider.id<br>
AND generation = $used[$provider.id].generation;<br>
<br>
if $rows_affected == 0:<br>
ROLLBACK;<br>
<br>
The only reason we would need a post-claim quota check is if some of the <br>
requested resources are owned and tracked by an external-to-Nova system.<br>
<br>
BTW, note to Ed Leafe... unless your distributed data store supports <br>
transactional semantics, you can't use a distributed data store for <br>
these types of solutions. Instead, you will need to write a whole bunch <br>
of code that does post-auditing of claims and quotas and a system that <br>
accepts that oversubscription and out-of-sync quota limits and usages is <br>
a fact of life. Not to mention needing to implement JOINs in Python.<br>
<br>
> But, it is my understanding that<br>
><br>
> (a) if you wish to do the SUM(column) approach that you propose,<br>
> you must have a reservation that is committed and then you must<br>
> re-read the SUM(column) to make sure you did not over-subscribe;<br>
> and<br>
<br>
Erm, kind of? Oversubscription is not possible in the solution I <br>
describe because the compare-and-update on the <br>
resource_providers.generation field allows for a consistent view of the <br>
resources used -- and if that view changes during the insertion of <br>
resource usage records -- the transaction containing those insertions is <br>
rolled back.<br>
<br>
> (b) to get away from reservations you must stop using the<br>
> SUM(column) approach and instead use a single quota_claimed<br>
> table to determine the current quota claimed.<br>
<br>
No. This has nothing to do with reservations.<br>
<br>
> At least that's what I understand of Jay's example from earlier in this<br>
> thread.<br>
><br>
> Let's definitely discuss this in Austin. While I don't love Jay's<br>
> solution for other reasons to do with making the quota table a hotspot<br>
> and things like that, it is a perfectly workable solution, I think.<br>
<br>
There is no quota table in my solution.<br>
<br>
If you refer to the resource_providers table (the table that has the <br>
generation field), then yes, it's a hot spot. But hot spots in the DB <br>
aren't necessarily a bad thing if you design the underlying schema properly.<br>
<br>
More in Austin.<br>
<br>
Best,<br>
-jay<br>
<br>
>><br>
>><br>
>>><br>
>>> Tim<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> From: Amrith Kumar <amrith@tesora.com><br>
>>> Reply-To: "OpenStack Development Mailing List (not for usage<br>
>>> questions)" <openstack-dev@lists.openstack.org><br>
>>> Date: Friday 22 April 2016 at 06:51<br>
>>> To: "OpenStack Development Mailing List (not for usage questions)"<br>
>>> <openstack-dev@lists.openstack.org><br>
>>> Subject: Re: [openstack-dev] More on the topic of DELIMITER, the<br>
>>> Quota Management Library proposal<br>
>>><br>
>>><br>
>>><br>
>>> I’ve thought more about Jay’s approach to enforcing quotas<br>
>>> and I think we can build on and around it. With that<br>
>>> implementation as the basic quota primitive, I think we can<br>
>>> build a quota management API that isn’t dependent on<br>
>>> reservations. It does place some burdens on the consuming<br>
>>> projects that I had hoped to avoid and these will cause<br>
>>> heartburn for some (make sure that you always request<br>
>>> resources in a consistent order and free them in a<br>
>>> consistent order being the most obvious).<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> If it doesn’t make it harder, I would like to see if we can<br>
>>> make the quota API take care of the ordering of requests.<br>
>>> i.e. if the quota API is an extension of Jay’s example and<br>
>>> accepts some data structure (dict?) with all the claims that<br>
>>> a project wants to make for some operation, and then<br>
>>> proceeds to make those claims for the project in the<br>
>>> consistent order, I think it would be of some value.<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> Beyond that, I’m on board with a-g below,<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> -amrith<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> From: Vilobh Meshram<br>
>>> [<a href="mailto:vilobhmeshram.openstack@gmail.com">mailto:vilobhmeshram.openstack@gmail.com</a>]<br>
>>> Sent: Friday, April 22, 2016 4:08 AM<br>
>>> To: OpenStack Development Mailing List (not for usage<br>
>>> questions) <openstack-dev@lists.openstack.org><br>
>>> Subject: Re: [openstack-dev] More on the topic of DELIMITER,<br>
>>> the Quota Management Library proposal<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> I strongly agree with Jay on the points related to "no<br>
>>> reservation" , keeping the interface simple and the role for<br>
>>> Delimiter (impose limits on resource consumption and enforce<br>
>>> quotas).<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> The point to keep user quota, tenant quotas in Keystone<br>
>>> sounds interestring and would need support from Keystone<br>
>>> team. We have a Cross project session planned [1] and will<br>
>>> definitely bring that up in that session.<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> The main thought with which Delimiter was formed was to<br>
>>> enforce resource quota in transaction safe manner and do it<br>
>>> in a cross-project conducive manner and it still holds<br>
>>> true. Delimiters mission is to impose limits on<br>
>>> resource consumption and enforce quotas in transaction safe<br>
>>> manner. Few key aspects of Delimiter are :-<br>
>>><br>
>>><br>
>>><br>
>>> a. Delimiter will be a new Library and not a Service.<br>
>>> Details covered in spec.<br>
>>><br>
>>><br>
>>> b. Delimiter's role will be to impose limits on resource<br>
>>> consumption.<br>
>>><br>
>>><br>
>>> c. Delimiter will not be responsible for rate limiting.<br>
>>><br>
>>><br>
>>> d. Delimiter will not maintain data for the resources.<br>
>>> Respective projects will take care of keeping, maintaining<br>
>>> data for the resources and resource consumption.<br>
>>><br>
>>><br>
>>> e. Delimiter will not have the concept of "reservations".<br>
>>> Delimiter will read or update the "actual" resource tables<br>
>>> and will not rely on the "cached" tables. At present, the<br>
>>> quota infrastructure in Nova, Cinder and other projects have<br>
>>> tables such as reservations, quota_usage, etc which are used<br>
>>> as "cached tables" to track re<br>
>>><br>
>>><br>
>>> f. Delimiter will fetch the information for project quota,<br>
>>> user quota from a centralized place, say Keystone, or if<br>
>>> that doesn't materialize will fetch default quota values<br>
>>> from respective service. This information will be cached<br>
>>> since it gets updated rarely but read many times.<br>
>>><br>
>>><br>
>>> g. Delimiter will take into consideration whether the<br>
>>> project is a Flat or Nested and will make the calculations<br>
>>> of allocated, available resources. Nested means project<br>
>>> namespace is hierarchical and Flat means project namespace<br>
>>> is not hierarchical.<br>
>>><br>
>>><br>
>>> -Vilobh<br>
>>><br>
>>><br>
>>> [1] <a href="https://www.openstack.org/summit/austin-2016/summit-schedule/events/9492">
https://www.openstack.org/summit/austin-2016/summit-schedule/events/9492</a><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> On Thu, Apr 21, 2016 at 11:08 PM, Joshua Harlow<br>
>>> <harlowja@fastmail.com> wrote:<br>
>>><br>
>>><br>
>>> Since people will be on a plane soon,<br>
>>><br>
>>> I threw this together as a example of a quota engine<br>
>>> (the zookeeper code does even work, and yes it<br>
>>> provides transactional semantics due to the nice<br>
>>> abilities of zookeeper znode versions[1] and its<br>
>>> inherent consistency model, yippe).<br>
>>><br>
>>> <a href="https://gist.github.com/harlowja/e7175c2d76e020a82ae94467a1441d85">
https://gist.github.com/harlowja/e7175c2d76e020a82ae94467a1441d85</a><br>
>>><br>
>>> Someone else can fill in the db quota engine with a<br>
>>> similar/equivalent api if they so dare, ha. Or even<br>
>>> feel to say the gist/api above is crap, cause that's<br>
>>> ok to, lol.<br>
>>><br>
>>> [1]<br>
>>> <a href="https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#Data+Access">
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#Data+Access</a><br>
>>><br>
>>><br>
>>><br>
>>> Amrith Kumar wrote:<br>
>>><br>
>>> Inline below ... thread is too long, will<br>
>>> catch you in Austin.<br>
>>><br>
>>><br>
>>> -----Original Message-----<br>
>>> From: Jay Pipes<br>
>>> [<a href="mailto:jaypipes@gmail.com">mailto:jaypipes@gmail.com</a>]<br>
>>> Sent: Thursday, April 21, 2016 8:08<br>
>>> PM<br>
>>> To:<br>
>>> openstack-dev@lists.openstack.org<br>
>>> Subject: Re: [openstack-dev] More on<br>
>>> the topic of DELIMITER, the Quota<br>
>>> Management Library proposal<br>
>>><br>
>>> Hmm, where do I start... I think I<br>
>>> will just cut to the two primary<br>
>>> disagreements I have. And I will<br>
>>> top-post because this email is way<br>
>>> too<br>
>>> big.<br>
>>><br>
>>> 1) On serializable isolation level.<br>
>>><br>
>>> No, you don't need it at all to<br>
>>> prevent races in claiming. Just use<br>
>>> a<br>
>>> compare-and-update with retries<br>
>>> strategy. Proof is here:<br>
>>><br>
>>> <a href="https://github.com/jaypipes/placement-bench/blob/master/placement.py#L97-">
https://github.com/jaypipes/placement-bench/blob/master/placement.py#L97-</a><br>
>>> L142<br>
>>><br>
>>> Works great and prevents multiple<br>
>>> writers from oversubscribing any<br>
>>> resource without relying on any<br>
>>> particular isolation level at all.<br>
>>><br>
>>> The `generation` field in the<br>
>>> inventories table is what allows<br>
>>> multiple<br>
>>> writers to ensure a consistent view<br>
>>> of the data without needing to rely<br>
>>> on<br>
>>> heavy lock-based semantics and/or<br>
>>> RDBMS-specific isolation levels.<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] this works for what it is doing, we<br>
>>> can definitely do this. This will work at<br>
>>> any isolation level, yes. I didn't want to<br>
>>> go this route because it is going to still<br>
>>> require an insert into another table<br>
>>> recording what the actual 'thing' is that is<br>
>>> claiming the resource and that insert is<br>
>>> going to be in a different transaction and<br>
>>> managing those two transactions was what I<br>
>>> wanted to avoid. I was hoping to avoid<br>
>>> having two tables tracking claims, one<br>
>>> showing the currently claimed quota and<br>
>>> another holding the things that claimed that<br>
>>> quota. Have to think again whether that is<br>
>>> possible.<br>
>>><br>
>>> 2) On reservations.<br>
>>><br>
>>> The reason I don't believe<br>
>>> reservations are necessary to be in<br>
>>> a quota<br>
>>> library is because reservations add<br>
>>> a concept of a time to a claim of<br>
>>> some<br>
>>> resource. You reserve some resource<br>
>>> to be claimed at some point in the<br>
>>> future and release those resources<br>
>>> at a point further in time.<br>
>>><br>
>>> Quota checking doesn't look at what<br>
>>> the state of some system will be at<br>
>>> some point in the future. It simply<br>
>>> returns whether the system *right<br>
>>> now* can handle a request *right<br>
>>> now* to claim a set of resources.<br>
>>><br>
>>> If you want reservation semantics<br>
>>> for some resource, that's totally<br>
>>> cool,<br>
>>> but IMHO, a reservation service<br>
>>> should live outside of the service<br>
>>> that is<br>
>>> actually responsible for providing<br>
>>> resources to a consumer.<br>
>>> Merging right-now quota checks and<br>
>>> future-based reservations into the<br>
>>> same<br>
>>> library just complicates things<br>
>>> unnecessarily IMHO.<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] extension of the above ...<br>
>>><br>
>>> 3) On resizes.<br>
>>><br>
>>> Look, I recognize some users see<br>
>>> some value in resizing their<br>
>>> resources.<br>
>>> That's fine. I personally think<br>
>>> expand operations are fine, and that<br>
>>> shrink operations are really the<br>
>>> operations that should be prohibited<br>
>>> in<br>
>>> the API. But, whatever, I'm fine<br>
>>> with resizing of requested resource<br>
>>> amounts. My big point is if you<br>
>>> don't have a separate table that<br>
>>> stores<br>
>>> quota_usages and instead only have a<br>
>>> single table that stores the actual<br>
>>> resource usage records, you don't<br>
>>> have to do *any* quota check<br>
>>> operations<br>
>>> at all upon deletion of a resource.<br>
>>> For modifying resource amounts (i.e.<br>
>>> a<br>
>>> resize) you merely need to change<br>
>>> the calculation of requested<br>
>>> resource<br>
>>> amounts to account for the<br>
>>> already-consumed usage amount.<br>
>>><br>
>>> Bottom line for me: I really won't<br>
>>> support any proposal for a complex<br>
>>> library that takes the resource<br>
>>> claim process out of the hands of<br>
>>> the<br>
>>> services that own those resources.<br>
>>> The simpler the interface of this<br>
>>> library, the better.<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] my proposal would not but this<br>
>>> email thread has got too long. Yes, simpler<br>
>>> interface, will catch you in Austin.<br>
>>><br>
>>> Best,<br>
>>> -jay<br>
>>><br>
>>> On 04/19/2016 09:59 PM, Amrith Kumar<br>
>>> wrote:<br>
>>><br>
>>> -----Original<br>
>>> Message-----<br>
>>> From: Jay Pipes<br>
>>> [<a href="mailto:jaypipes@gmail.com">mailto:jaypipes@gmail.com</a>]<br>
>>> Sent: Monday, April<br>
>>> 18, 2016 2:54 PM<br>
>>> To:<br>
>>> openstack-dev@lists.openstack.org<br>
>>> Subject: Re:<br>
>>> [openstack-dev] More<br>
>>> on the topic of<br>
>>> DELIMITER, the<br>
>>> Quota Management<br>
>>> Library proposal<br>
>>><br>
>>> On 04/16/2016 05:51<br>
>>> PM, Amrith Kumar<br>
>>> wrote:<br>
>>><br>
>>> If we<br>
>>> therefore<br>
>>> assume that<br>
>>> this will be<br>
>>> a Quota<br>
>>> Management<br>
>>> Library,<br>
>>> it is safe<br>
>>> to assume<br>
>>> that quotas<br>
>>> are going to<br>
>>> be managed<br>
>>> on a<br>
>>> per-project<br>
>>> basis, where<br>
>>> participating projects will use this library.<br>
>>> I believe<br>
>>> that it<br>
>>> stands to<br>
>>> reason that<br>
>>> any data<br>
>>> persistence<br>
>>> will<br>
>>> have to be<br>
>>> in a<br>
>>> location<br>
>>> decided by<br>
>>> the<br>
>>> individual<br>
>>> project.<br>
>>><br>
>>><br>
>>> Depends on what you<br>
>>> mean by "any data<br>
>>> persistence". If you<br>
>>> are<br>
>>> referring to the<br>
>>> storage of quota<br>
>>> values (per user,<br>
>>> per tenant,<br>
>>> global, etc) I think<br>
>>> that should be done<br>
>>> by the Keystone<br>
>>> service.<br>
>>> This data is<br>
>>> essentially an<br>
>>> attribute of the<br>
>>> user or the tenant<br>
>>> or the<br>
>>><br>
>>><br>
>>> service endpoint itself (i.e.<br>
>>><br>
>>><br>
>>> global defaults).<br>
>>> This data also<br>
>>> rarely changes and<br>
>>> logically belongs<br>
>>> to the service that<br>
>>> manages users,<br>
>>> tenants, and service<br>
>>> endpoints:<br>
>>><br>
>>><br>
>>> Keystone.<br>
>>><br>
>>><br>
>>> If you are referring<br>
>>> to the storage of<br>
>>> resource usage<br>
>>> records, yes,<br>
>>> each service project<br>
>>> should own that data<br>
>>> (and frankly, I<br>
>>> don't see a<br>
>>> need to persist any<br>
>>> quota usage data at<br>
>>> all, as I mentioned<br>
>>> in a<br>
>>> previous reply to<br>
>>> Attila).<br>
>>><br>
>>><br>
>>> [amrith] You make a<br>
>>> distinction that I had made<br>
>>> implicitly, and it is<br>
>>> important to highlight it.<br>
>>> Thanks for pointing it out.<br>
>>> Yes, I meant<br>
>>> both of the above, and as<br>
>>> stipulated. Global defaults<br>
>>> in keystone<br>
>>> (somehow, TBD) and usage<br>
>>> records, on a per-service<br>
>>> basis.<br>
>>><br>
>>> That may not<br>
>>> be a very<br>
>>> interesting<br>
>>> statement<br>
>>> but the<br>
>>> corollary<br>
>>> is, I<br>
>>> think, a<br>
>>> very<br>
>>> significant<br>
>>> statement;<br>
>>> it cannot be<br>
>>> assumed that<br>
>>> the<br>
>>> quota<br>
>>> management<br>
>>> information<br>
>>> for all<br>
>>> participating projects is in<br>
>>> the same<br>
>>> database.<br>
>>><br>
>>><br>
>>> It cannot be assumed<br>
>>> that this<br>
>>> information is even<br>
>>> in a database at<br>
>>><br>
>>><br>
>>><br>
>>> all...<br>
>>><br>
>>><br>
>>> [amrith] I don't follow. If<br>
>>> the service in question is<br>
>>> to be scalable,<br>
>>> I think it stands to reason<br>
>>> that there must be some<br>
>>> mechanism by which<br>
>>> instances of the service can<br>
>>> share usage records (as you<br>
>>> refer to<br>
>>> them, and I like that term).<br>
>>> I think it stands to reason<br>
>>> that there<br>
>>> must be some database, no?<br>
>>><br>
>>> A<br>
>>> hypothetical<br>
>>> service<br>
>>> consuming<br>
>>> the<br>
>>> Delimiter<br>
>>> library<br>
>>> provides<br>
>>> requesters<br>
>>> with some<br>
>>> widgets, and<br>
>>> wishes to<br>
>>> track the<br>
>>> widgets that<br>
>>> it has<br>
>>> provisioned<br>
>>> both on a<br>
>>> per-user<br>
>>> basis, and<br>
>>> on the<br>
>>> whole. It<br>
>>> should<br>
>>> therefore<br>
>>> multi-tenant<br>
>>> and able to<br>
>>> track the<br>
>>> widgets on a<br>
>>> per<br>
>>> tenant basis<br>
>>> and if<br>
>>> required<br>
>>> impose<br>
>>> limits on<br>
>>> the number<br>
>>> of widgets<br>
>>> that a<br>
>>> tenant may<br>
>>> consume at a<br>
>>> time, during<br>
>>> a course of<br>
>>> a period of<br>
>>> time, and so<br>
>>> on.<br>
>>><br>
>>><br>
>>> No, this last part<br>
>>> is absolutely not<br>
>>> what I think quota<br>
>>> management<br>
>>> should be about.<br>
>>><br>
>>> Rate limiting --<br>
>>> i.e. how many<br>
>>> requests a<br>
>>> particular user can<br>
>>> make of<br>
>>> an API in a given<br>
>>> period of time --<br>
>>> should *not* be<br>
>>> handled by<br>
>>> OpenStack API<br>
>>> services, IMHO. It<br>
>>> is the<br>
>>> responsibility of<br>
>>> the<br>
>>> deployer to handle<br>
>>> this using<br>
>>> off-the-shelf<br>
>>> rate-limiting<br>
>>> solutions<br>
>>><br>
>>><br>
>>> (open source or proprietary).<br>
>>><br>
>>><br>
>>> Quotas should only<br>
>>> be about the hard<br>
>>> limit of different<br>
>>> types of<br>
>>> resources that a<br>
>>> user or group of<br>
>>> users can consume at<br>
>>> a given time.<br>
>>><br>
>>><br>
>>> [amrith] OK, good point.<br>
>>> Agreed as stipulated.<br>
>>><br>
>>><br>
>>> Such a<br>
>>> hypothetical<br>
>>> service may<br>
>>> also consume<br>
>>> resources<br>
>>> from other<br>
>>> services<br>
>>> that it<br>
>>> wishes to<br>
>>> track, and<br>
>>> impose<br>
>>> limits on.<br>
>>><br>
>>><br>
>>> Yes, absolutely<br>
>>> agreed.<br>
>>><br>
>>><br>
>>> It is also<br>
>>> understood<br>
>>> as Jay Pipes<br>
>>> points out<br>
>>> in [4] that<br>
>>> the actual<br>
>>> process of<br>
>>> provisioning<br>
>>> widgets<br>
>>> could be<br>
>>> time<br>
>>> consuming<br>
>>> and it is<br>
>>> ill-advised<br>
>>> to hold a<br>
>>> database<br>
>>> transaction<br>
>>> of any kind<br>
>>> open for<br>
>>> that<br>
>>> duration of<br>
>>> time.<br>
>>> Ensuring<br>
>>> that a user<br>
>>> does not<br>
>>> exceed some<br>
>>> limit on<br>
>>> the number<br>
>>> of<br>
>>> concurrent<br>
>>> widgets that<br>
>>> he or she<br>
>>> may create<br>
>>> therefore<br>
>>> requires<br>
>>> some<br>
>>> mechanism to<br>
>>> track<br>
>>> in-flight<br>
>>> requests for<br>
>>> widgets. I<br>
>>> view these<br>
>>> as "intent"<br>
>>> but not yet<br>
>>> materialized.<br>
>>><br>
>>><br>
>>> It has nothing to do<br>
>>> with the amount of<br>
>>> concurrent widgets<br>
>>> that a<br>
>>> user can create.<br>
>>> It's just about the<br>
>>> total number of some<br>
>>> resource<br>
>>> that may be consumed<br>
>>> by that user.<br>
>>><br>
>>> As for an "intent",<br>
>>> I don't believe<br>
>>> tracking intent is<br>
>>> the right way<br>
>>> to go at all. As<br>
>>> I've mentioned<br>
>>> before, the major<br>
>>> problem in Nova's<br>
>>> quota system is that<br>
>>> there are two tables<br>
>>> storing resource<br>
>>> usage<br>
>>> records: the<br>
>>> *actual* resource<br>
>>> usage tables (the<br>
>>> allocations table in<br>
>>> the new<br>
>>> resource- providers<br>
>>> modeling and the<br>
>>> instance_extra,<br>
>>> pci_devices and<br>
>>> instances table in<br>
>>> the legacy modeling)<br>
>>> and the *quota<br>
>>> usage* tables<br>
>>> (quota_usages and<br>
>>> reservations<br>
>>> tables). The<br>
>>> quota_usages table<br>
>>> does<br>
>>> not need to exist at<br>
>>> all, and neither<br>
>>> does the<br>
>>> reservations table.<br>
>>> Don't do<br>
>>> intent-based<br>
>>> consumption.<br>
>>> Instead, just<br>
>>> consume (claim) by<br>
>>> writing a record for<br>
>>> the resource class<br>
>>> consumed on a<br>
>>> provider into<br>
>>> the actual resource<br>
>>> usages table and<br>
>>> then "check quotas"<br>
>>> by querying<br>
>>> the *actual*<br>
>>> resource usages and<br>
>>> comparing the<br>
>>> SUM(used) values,<br>
>>> grouped by resource<br>
>>> class, against the<br>
>>> appropriate quota<br>
>>> limits for<br>
>>> the user. The<br>
>>> introduction of the<br>
>>> quota_usages and<br>
>>> reservations<br>
>>> tables to cache<br>
>>> usage records is the<br>
>>> primary reason for<br>
>>> the race<br>
>>> problems in the Nova<br>
>>> (and<br>
>>> other) quota system<br>
>>> because every time<br>
>>> you introduce a<br>
>>> caching system<br>
>>> for highly-volatile<br>
>>> data (like usage<br>
>>> records) you<br>
>>> introduce<br>
>>> complexity into the<br>
>>> write path and the<br>
>>> need to track the<br>
>>> same thing<br>
>>> across multiple<br>
>>> writes to different<br>
>>> tables needlessly.<br>
>>><br>
>>><br>
>>> [amrith] I don't agree, I'll<br>
>>> respond to this and the next<br>
>>> comment group<br>
>>><br>
>>><br>
>>><br>
>>> together. See below.<br>
>>><br>
>>><br>
>>> Looking up<br>
>>> at this<br>
>>> whole<br>
>>> infrastructure from the perspective of the<br>
>>> database, I<br>
>>> think we<br>
>>> should<br>
>>> require that<br>
>>> the database<br>
>>> must not be<br>
>>> required to<br>
>>> operate in<br>
>>> any<br>
>>> isolation<br>
>>> mode higher<br>
>>> than<br>
>>> READ-COMMITTED; more about that later (i.e. requiring a database run<br>
>>> either<br>
>>> serializable<br>
>>> or<br>
>>> repeatable<br>
>>> read is a<br>
>>> show<br>
>>> stopper).<br>
>>><br>
>>><br>
>>> This is an<br>
>>> implementation<br>
>>> detail is not<br>
>>> relevant to the<br>
>>> discussion<br>
>>> about what the<br>
>>> interface of a quota<br>
>>> library would look<br>
>>> like.<br>
>>><br>
>>><br>
>>> [amrith] I disagree, let me<br>
>>> give you an example of why.<br>
>>><br>
>>> Earlier, I wrote:<br>
>>><br>
>>> Such a<br>
>>> hypothetical<br>
>>> service may<br>
>>> also consume<br>
>>> resources<br>
>>> from other<br>
>>> services<br>
>>> that it<br>
>>> wishes to<br>
>>> track, and<br>
>>> impose<br>
>>> limits on.<br>
>>><br>
>>><br>
>>> And you responded:<br>
>>><br>
>>><br>
>>> Yes, absolutely<br>
>>> agreed.<br>
>>><br>
>>><br>
>>><br>
>>> So let's take this<br>
>>> hypothetical service that in<br>
>>> response to a user<br>
>>><br>
>>><br>
>>><br>
>>> request, will provision a Cinder<br>
>>> volume and a Nova instance. Let's<br>
>>> assume<br>
>>> that the service also imposes limits<br>
>>> on the number of cinder volumes and<br>
>>> nova instances the user may<br>
>>> provision; independent of limits<br>
>>> that Nova and<br>
>>> Cinder may themselves maintain.<br>
>>><br>
>>> One way that the<br>
>>> hypothetical service can<br>
>>> function is this:<br>
>>><br>
>>> (a) check Cinder quota, if<br>
>>> successful, create cinder<br>
>>> volume<br>
>>> (b) check Nova quota, if<br>
>>> successful, create nova<br>
>>> instance with cinder<br>
>>> volume attachment<br>
>>><br>
>>> Now, this is sub-optimal as<br>
>>> there are going to be some<br>
>>> number of cases<br>
>>><br>
>>><br>
>>> where the nova quota check fails.<br>
>>> Now you have needlessly created and<br>
>>> will<br>
>>> have to release a cinder volume. It<br>
>>> also takes longer to fail.<br>
>>><br>
>>> Another way to do this is<br>
>>> this:<br>
>>><br>
>>> (1) check Cinder quota, if<br>
>>> successful, check Nova<br>
>>> quota, if successful<br>
>>> proceed to (2) else error<br>
>>> out<br>
>>> (2) create cinder volume<br>
>>> (3) create nova instance<br>
>>> with cinder attachment.<br>
>>><br>
>>> I'm trying to get to this<br>
>>> latter form of doing things.<br>
>>><br>
>>> Easy, you might say ...<br>
>>> theoretically this should<br>
>>> simply be:<br>
>>><br>
>>> BEGIN;<br>
>>> -- Get data to do the Cinder<br>
>>> check<br>
>>><br>
>>> SELECT ......<br>
>>><br>
>>> -- Do the cinder check<br>
>>><br>
>>> INSERT INTO ....<br>
>>><br>
>>> -- Get data to do the Nova<br>
>>> check<br>
>>><br>
>>> SELECT ....<br>
>>><br>
>>> -- Do the Nova check<br>
>>><br>
>>> INSERT INTO ...<br>
>>><br>
>>> COMMIT<br>
>>><br>
>>> You can only make this work<br>
>>> if you ran at isolation<br>
>>> level serializable.<br>
>>><br>
>>><br>
>>> Why?<br>
>>><br>
>>><br>
>>> To make this run at<br>
>>> isolation level<br>
>>> REPEATABLE-READ, you must<br>
>>> enforce<br>
>>><br>
>>><br>
>>><br>
>>> constraints at the database level<br>
>>> that will fail the commit. But wait,<br>
>>> you<br>
>>> can't do that because the data about<br>
>>> the global limits may not be in the<br>
>>> same database as the usage records.<br>
>>> Later you talk about caching and<br>
>>> stuff; all that doesn't help a<br>
>>> database constraint.<br>
>>><br>
>>> For this reason, I think<br>
>>> there is going to have to be<br>
>>> some cognizance to<br>
>>><br>
>>><br>
>>><br>
>>> the database isolation level in the<br>
>>> design of the library, and I think<br>
>>> it<br>
>>> will also impact the API that can be<br>
>>> constructed.<br>
>>><br>
>>> In general<br>
>>> therefore, I<br>
>>> believe that<br>
>>> the<br>
>>> hypothetical<br>
>>> service<br>
>>> processing<br>
>>> requests for<br>
>>> widgets<br>
>>> would have<br>
>>> to handle<br>
>>> three kinds<br>
>>> of<br>
>>> operations,<br>
>>> provision,<br>
>>> modify, and<br>
>>> destroy. The<br>
>>> names are, I<br>
>>> believe,<br>
>>> self-explanatory.<br>
>>><br>
>>><br>
>>> Generally,<br>
>>> modification of a<br>
>>> resource doesn't<br>
>>> come into play. The<br>
>>> primary exception to<br>
>>> this is for<br>
>>> transferring of<br>
>>> ownership of some<br>
>>><br>
>>><br>
>>> resource.<br>
>>><br>
>>><br>
>>> [amrith] Trove RESIZE is a<br>
>>> huge benefit for users and<br>
>>> while it may be a<br>
>>><br>
>>><br>
>>><br>
>>> pain as you say, this is still a<br>
>>> very real benefit. Trove allows you<br>
>>> to<br>
>>> resize both your storage (resize the<br>
>>> cinder volume) and resize your<br>
>>> instance (change the flavor).<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> Without loss<br>
>>> of<br>
>>> generality,<br>
>>> one can say<br>
>>> that all<br>
>>> three of<br>
>>> them must<br>
>>> validate<br>
>>> that the<br>
>>> operation<br>
>>> does not<br>
>>> violate some<br>
>>> limit (no<br>
>>> more<br>
>>> than X<br>
>>> widgets, no<br>
>>> fewer than X<br>
>>> widgets,<br>
>>> rates, and<br>
>>> so on).<br>
>>><br>
>>><br>
>>> No, only the<br>
>>> creation (and very<br>
>>> rarely the<br>
>>> modification) needs<br>
>>> any<br>
>>> validation that a<br>
>>> limit could been<br>
>>> violated. Destroying<br>
>>> a resource<br>
>>> never needs to be<br>
>>> checked for limit<br>
>>> violations.<br>
>>><br>
>>><br>
>>> [amrith] Well, if you are<br>
>>> going to create a volume of<br>
>>> 10GB and your<br>
>>><br>
>>><br>
>>><br>
>>> limit is 100GB, resizing it to 200GB<br>
>>> should fail, I think.<br>
>>><br>
>>><br>
>>> Assuming<br>
>>> that the<br>
>>> service<br>
>>> provisions<br>
>>> resources<br>
>>> from other<br>
>>> services,<br>
>>> it is also<br>
>>> conceivable<br>
>>> that limits<br>
>>> be imposed<br>
>>> on the<br>
>>> quantum of<br>
>>> those<br>
>>> services<br>
>>> consumed. In<br>
>>> practice, I<br>
>>> can imagine<br>
>>> a service<br>
>>> like<br>
>>> Trove using<br>
>>> the<br>
>>> Delimiter<br>
>>> project to<br>
>>> perform all<br>
>>> of these<br>
>>> kinds of<br>
>>> limit<br>
>>> checks; I'm<br>
>>> not<br>
>>> suggesting<br>
>>> that it does<br>
>>> this today,<br>
>>> nor that<br>
>>> there is an<br>
>>> immediate<br>
>>> plan to<br>
>>> implement<br>
>>> all of them,<br>
>>> just that<br>
>>> these<br>
>>> all seem<br>
>>> like good<br>
>>> uses a Quota<br>
>>> Management<br>
>>> capability.<br>
>>><br>
>>> - User may<br>
>>> not have<br>
>>> more than 25<br>
>>> database<br>
>>> instances at<br>
>>> a<br>
>>><br>
>>><br>
>>> time<br>
>>><br>
>>><br>
>>> -<br>
>>> User may not<br>
>>> have more<br>
>>> than 4<br>
>>> clusters at<br>
>>> a time<br>
>>> - User may<br>
>>> not consume<br>
>>> more than<br>
>>> 3TB of SSD<br>
>>> storage at a<br>
>>> time<br>
>>><br>
>>><br>
>>> Only if SSD storage<br>
>>> is a distinct<br>
>>> resource class from<br>
>>> DISK_GB. Right<br>
>>> now, Nova makes no<br>
>>> differentiation<br>
>>> w.r.t. SSD or HDD or<br>
>>> shared vs.<br>
>>> local block storage.<br>
>>><br>
>>><br>
>>> [amrith] It matters not to<br>
>>> Trove whether Nova does nor<br>
>>> not. Cinder<br>
>>><br>
>>><br>
>>><br>
>>> supports volume-types and users DO<br>
>>> want to limit based on volume-type<br>
>>> (for<br>
>>> example).<br>
>>><br>
>>> -<br>
>>> User may not<br>
>>> launch more<br>
>>> than 10 huge<br>
>>> instances at<br>
>>> a<br>
>>> time<br>
>>><br>
>>><br>
>>> What is the point of<br>
>>> such a limit?<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] Metering usage,<br>
>>> placing limitations on the<br>
>>> quantum of resources<br>
>>><br>
>>><br>
>>><br>
>>> that a user may provision. Same as<br>
>>> with Nova. A flavor is merely a<br>
>>> simple<br>
>>> way to tie together a bag of<br>
>>> resources. It is a way to restrict<br>
>>> access,<br>
>>> for example, to specific resources<br>
>>> that are available in the cloud.<br>
>>> HUGE<br>
>>> is just an example I gave, pick any<br>
>>> flavor you want, and here's how a<br>
>>> service like Trove uses it.<br>
>>><br>
>>> Users can ask to launch an<br>
>>> instance of a specific<br>
>>> database+version;<br>
>>><br>
>>><br>
>>><br>
>>> MySQL 5.6-48 for example. Now, an<br>
>>> operator can restrict the instance<br>
>>> flavors, or volume types that can be<br>
>>> associated with the specific<br>
>>> datastore. And the flavor could be<br>
>>> used to map to, for example whether<br>
>>> the<br>
>>> instance is running on bare metal or<br>
>>> in a VM and if so with what kind of<br>
>>> hardware. That's a useful construct<br>
>>> for a service like Trove.<br>
>>><br>
>>> -<br>
>>> User may not<br>
>>> launch more<br>
>>> than 3<br>
>>> clusters an<br>
>>> hour<br>
>>><br>
>>><br>
>>><br>
>>> -1. This is rate<br>
>>> limiting and should<br>
>>> be handled by<br>
>>> rate-limiting<br>
>>><br>
>>><br>
>>><br>
>>> services.<br>
>>><br>
>>><br>
>>> -<br>
>>> No more than<br>
>>> 500 copies<br>
>>> of Oracle<br>
>>> may be run<br>
>>> at a time<br>
>>><br>
>>><br>
>>><br>
>>> Is "Oracle" a<br>
>>> resource class?<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] As I view it, every<br>
>>> project should be free to<br>
>>> define its own<br>
>>><br>
>>><br>
>>><br>
>>> set of resource classes and meter<br>
>>> them as it feels fit. So, while<br>
>>> Oracle<br>
>>> licenses may not, conceivably a lot<br>
>>> of things that Nova, Cinder, and the<br>
>>> other core projects don't care<br>
>>> about, are in fact relevant for a<br>
>>> consumer<br>
>>> of this library.<br>
>>><br>
>>> While Nova<br>
>>> would be the<br>
>>> service that<br>
>>> limits the<br>
>>> number of<br>
>>> instances<br>
>>> a user can<br>
>>> have at a<br>
>>> time, the<br>
>>> ability for<br>
>>> a service to<br>
>>> limit this<br>
>>> further<br>
>>> should not<br>
>>> be<br>
>>> underestimated.<br>
>>><br>
>>> In turn,<br>
>>> should Nova<br>
>>> and Cinder<br>
>>> also use the<br>
>>> same Quota<br>
>>> Management<br>
>>> Library,<br>
>>> they may<br>
>>> each impose<br>
>>> limitations<br>
>>> like:<br>
>>><br>
>>> - User may<br>
>>> not launch<br>
>>> more than 20<br>
>>> huge<br>
>>> instances at<br>
>>> a<br>
>>> time<br>
>>><br>
>>><br>
>>> Not a useful<br>
>>> limitation IMHO.<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] I beg to differ.<br>
>>> Again a huge instance is<br>
>>> just an example of<br>
>>><br>
>>><br>
>>><br>
>>> some flavor; and the idea is to<br>
>>> allow a project to place its own<br>
>>> metrics<br>
>>> and meter based on those.<br>
>>><br>
>>> -<br>
>>> User may not<br>
>>> launch more<br>
>>> than 3<br>
>>> instances in<br>
>>> a minute<br>
>>><br>
>>><br>
>>><br>
>>> -1. This is rate<br>
>>> limiting.<br>
>>><br>
>>><br>
>>> -<br>
>>> User may not<br>
>>> consume more<br>
>>> than 15TB of<br>
>>> SSD at a<br>
>>> time<br>
>>> - User may<br>
>>> not have<br>
>>> more than 30<br>
>>> volumes at a<br>
>>> time<br>
>>><br>
>>> Again, I'm<br>
>>> not implying<br>
>>> that either<br>
>>> Nova or<br>
>>> Cinder<br>
>>> should<br>
>>> provide<br>
>>> these<br>
>>> capabilities.<br>
>>><br>
>>> With this in<br>
>>> mind, I<br>
>>> believe that<br>
>>> the minimal<br>
>>> set of<br>
>>> operations<br>
>>> that<br>
>>> Delimiter<br>
>>> should<br>
>>> provide are:<br>
>>><br>
>>> -<br>
>>> define_resource(name, max, min, user_max, user_min, ...)<br>
>>><br>
>>><br>
>>> What would the above<br>
>>> do? What service<br>
>>> would it be speaking<br>
>>> to?<br>
>>><br>
>>><br>
>>><br>
>>> [amrith] I assume that this<br>
>>> would speak with some<br>
>>> backend (either<br>
>>><br>
>>><br>
>>><br>
>>> keystone or the project itself) and<br>
>>> record these designated limits. This<br>
>>> is the way to register a project<br>
>>> specific metric like "Oracle<br>
>>> licenses".<br>
>>><br>
>>> -<br>
>>> update_resource_limits(name, user, user_max, user_min,<br>
>>> ...)<br>
>>><br>
>>><br>
>>> This doesn't belong<br>
>>> in a quota library.<br>
>>> It belongs as a REST<br>
>>> API in<br>
>>> Keystone.<br>
>>><br>
>>><br>
>>> [amrith] Fine, same place<br>
>>> where the previous thing<br>
>>> stores the global<br>
>>><br>
>>><br>
>>><br>
>>> defaults is the target of this call.<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> -<br>
>>> reserve_resource(name, user, size, parent_resource, ...)<br>
>>><br>
>>><br>
>>><br>
>>> This doesn't belong<br>
>>> in a quota library<br>
>>> at all. I think<br>
>>> reservations<br>
>>> are not germane to<br>
>>> resource consumption<br>
>>> and should be<br>
>>> handled by an<br>
>>> external service at<br>
>>> the orchestration<br>
>>> layer.<br>
>>><br>
>>><br>
>>> [amrith] Again not true, as<br>
>>> illustrated above this<br>
>>> library is the thing<br>
>>><br>
>>><br>
>>><br>
>>> that projects could use to determine<br>
>>> whether or not to honor a request.<br>
>>> This reserve/provision process is, I<br>
>>> believe required because of the<br>
>>> vagaries of how we want to implement<br>
>>> this in the database.<br>
>>><br>
>>> -<br>
>>> provision_resource(resource, id)<br>
>>><br>
>>><br>
>>><br>
>>> A quota library<br>
>>> should not be<br>
>>> provisioning<br>
>>> anything. A quota<br>
>>> library<br>
>>> should simply<br>
>>> provide a consistent<br>
>>> interface for<br>
>>> *checking* that a<br>
>>> structured request<br>
>>> for some set of<br>
>>> resources *can* be<br>
>>> provided by the<br>
>>> service.<br>
>>><br>
>>><br>
>>> [amrith] This does not<br>
>>> actually call Nova or<br>
>>> anything; merely that AFTER<br>
>>><br>
>>><br>
>>><br>
>>> the hypothetical service has called<br>
>>> NOVA, this converts the reservation<br>
>>> (which can expire) into an actual<br>
>>> allocation.<br>
>>><br>
>>> -<br>
>>> update_resource(id or resource, newsize)<br>
>>><br>
>>><br>
>>><br>
>>> Resizing resources<br>
>>> is a bad idea, IMHO.<br>
>>> Resources are easier<br>
>>> to deal<br>
>>> with when they are<br>
>>> considered of<br>
>>> immutable size and<br>
>>> simple (i.e. not<br>
>>> complex or nested).<br>
>>> I think the problem<br>
>>> here is in the<br>
>>> definition of<br>
>>> resource classes<br>
>>> improperly.<br>
>>><br>
>>><br>
>>> [amrith] Let's leave the<br>
>>> quota library aside. This<br>
>>> assertion strikes at<br>
>>><br>
>>><br>
>>><br>
>>> the very heart of things like Nova<br>
>>> resize, or for that matter Cinder<br>
>>> volume resize. Are those all bad<br>
>>> ideas? I made a 500GB Cinder volume<br>
>>> and<br>
>>> it is getting close to full. I'd<br>
>>> like to resize it to 2TB. Are you<br>
>>> saying<br>
>>> that's not a valid use case?<br>
>>><br>
>>> For example, a<br>
>>> "cluster" is not a<br>
>>> resource. It is a<br>
>>> collection of<br>
>>> resources of type<br>
>>> node. "Resizing" a<br>
>>> cluster is a<br>
>>> misnomer, because<br>
>>> you aren't resizing<br>
>>> a resource at all.<br>
>>> Instead, you are<br>
>>> creating or<br>
>>> destroying resources<br>
>>> inside the cluster<br>
>>> (i.e. joining or<br>
>>> leaving<br>
>>><br>
>>><br>
>>> cluster nodes).<br>
>>><br>
>>><br>
>>> BTW, this is also<br>
>>> why the "resize<br>
>>> instance" API in<br>
>>> Nova is such a<br>
>>> giant pain in the<br>
>>> ass. It's attempting<br>
>>> to "modify" the<br>
>>> instance<br>
>>><br>
>>><br>
>>> "resource"<br>
>>><br>
>>><br>
>>> when the instance<br>
>>> isn't really the<br>
>>> resource at all. The<br>
>>> VCPU, RAM_MB,<br>
>>> DISK_GB, and PCI<br>
>>> devices are the<br>
>>> actual resources.<br>
>>> The instance is a<br>
>>> convenient way to<br>
>>> tie those resources<br>
>>> together, and doing<br>
>>> a "resize"<br>
>>> of the instance<br>
>>> behind the scenes<br>
>>> actually performs a<br>
>>> *move*<br>
>>> operation, which<br>
>>> isn't a *change* of<br>
>>> the original<br>
>>> resources. Rather,<br>
>>> it is a creation of<br>
>>> a new set of<br>
>>> resources (of the<br>
>>> new amounts) and a<br>
>>> deletion of the old<br>
>>> set of resources.<br>
>>><br>
>>><br>
>>> [amrith] that's fine, if all<br>
>>> we want is to handle the<br>
>>> resize operation<br>
>>><br>
>>><br>
>>><br>
>>> as a new instance followed by a<br>
>>> deletion, that's great. But that<br>
>>> semantic<br>
>>> isn't necessarily the case for<br>
>>> something like (say) cinder.<br>
>>><br>
>>> The "resize" API<br>
>>> call adds some nasty<br>
>>> confirmation and<br>
>>> cancel<br>
>>> semantics to the<br>
>>> calling interface<br>
>>> that hint that the<br>
>>> underlying<br>
>>> implementation of<br>
>>> the "resize"<br>
>>> operation is in<br>
>>> actuality not a<br>
>>> resize<br>
>>> at all, but rather a<br>
>>> create-new-and-delete-old-resources operation.<br>
>>><br>
>>><br>
>>> [amrith] And that isn't<br>
>>> germane to a quota library,<br>
>>> I don't think. What<br>
>>><br>
>>><br>
>>><br>
>>> is, is this. Do we want to treat the<br>
>>> transient state when there are (for<br>
>>> example of Nova) two instances, one<br>
>>> of the new flavor and one of the old<br>
>>> flavor, or not. But, from the<br>
>>> perspective of a quota library, a<br>
>>> resize<br>
>>> operation is merely a reset of the<br>
>>> quota by the delta in the resource<br>
>>> consumed.<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> -<br>
>>> release_resource(id or resource)<br>
>>> -<br>
>>> expire_reservations()<br>
>>><br>
>>><br>
>>> I see no need to<br>
>>> have reservations in<br>
>>> the quota library at<br>
>>> all, as<br>
>>> mentioned above.<br>
>>><br>
>>><br>
>>> [amrith] Then I think the<br>
>>> quota library must require<br>
>>> that either (a) the<br>
>>><br>
>>><br>
>>><br>
>>> underlying database runs<br>
>>> serializable or (b) database<br>
>>> constraints can be<br>
>>> used to enforce that at commit the<br>
>>> global limits are adhered to.<br>
>>><br>
>>> As for your proposed<br>
>>> interface and<br>
>>> calling structure<br>
>>> below, I think a<br>
>>> much simpler<br>
>>> proposal would work<br>
>>> better. I'll work on<br>
>>> a cross-project<br>
>>> spec that describes<br>
>>> this simpler<br>
>>> proposal, but the<br>
>>> basics would be:<br>
>>><br>
>>> 1) Have Keystone<br>
>>> store quota<br>
>>> information for<br>
>>> defaults (per<br>
>>> service<br>
>>> endpoint), for<br>
>>> tenants and for<br>
>>> users.<br>
>>><br>
>>> Keystone would have<br>
>>> the set of canonical<br>
>>> resource class<br>
>>> names, and<br>
>>> each project, upon<br>
>>> handling a new<br>
>>> resource class,<br>
>>> would be<br>
>>> responsible for a<br>
>>> change submitted to<br>
>>> Keystone to add the<br>
>>> new resource<br>
>>><br>
>>><br>
>>> class code.<br>
>>><br>
>>><br>
>>> Straw man REST API:<br>
>>><br>
>>> GET /quotas/resource-classes<br>
>>> 200 OK<br>
>>> {<br>
>>> "resource_classes":<br>
>>> {<br>
>>> "compute.vcpu": {<br>
>>> "service":<br>
>>> "compute",<br>
>>> "code":<br>
>>> "compute.vcpu",<br>
>>> "description": "A<br>
>>> virtual CPU unit"<br>
>>> },<br>
>>> "compute.ram_mb": {<br>
>>> "service":<br>
>>> "compute",<br>
>>> "code":<br>
>>> "compute.ram_mb",<br>
>>> "description":<br>
>>> "Memory in<br>
>>> megabytes"<br>
>>> },<br>
>>> ...<br>
>>> "volume.disk_gb": {<br>
>>> "service": "volume",<br>
>>> "code":<br>
>>> "volume.disk_gb",<br>
>>> "description":<br>
>>> "Amount of disk<br>
>>> space in gigabytes"<br>
>>> },<br>
>>> ...<br>
>>> "database.count": {<br>
>>> "service":<br>
>>> "database",<br>
>>> "code":<br>
>>> "database.count",<br>
>>> "description":<br>
>>> "Number of database<br>
>>> instances"<br>
>>> }<br>
>>> }<br>
>>> }<br>
>>><br>
>>><br>
>>> [amrith] Well, a user is<br>
>>> allowed to have a certain<br>
>>> compute quota (which<br>
>>><br>
>>><br>
>>><br>
>>> is shared by Nova and Trove) but<br>
>>> also a Trove quota. How would your<br>
>>> representation represent that?<br>
>>><br>
>>> # Get the default<br>
>>> limits for new<br>
>>> users...<br>
>>> GET /quotas/defaults<br>
>>> 200 OK<br>
>>> {<br>
>>> "quotas": {<br>
>>> "compute.vcpu": 100,<br>
>>> "compute.ram_mb":<br>
>>> 32768,<br>
>>> "volume.disk_gb":<br>
>>> 1000,<br>
>>> "database.count": 25<br>
>>> }<br>
>>> }<br>
>>><br>
>>> # Get a specific<br>
>>> user's limits...<br>
>>> GET /quotas/users/{UUID}<br>
>>> 200 OK<br>
>>> {<br>
>>> "quotas": {<br>
>>> "compute.vcpu": 100,<br>
>>> "compute.ram_mb":<br>
>>> 32768,<br>
>>> "volume.disk_gb":<br>
>>> 1000,<br>
>>> "database.count": 25<br>
>>> }<br>
>>> }<br>
>>><br>
>>> # Get a tenant's<br>
>>> limits...<br>
>>> GET /quotas/tenants/{UUID}<br>
>>> 200 OK<br>
>>> {<br>
>>> "quotas": {<br>
>>> "compute.vcpu":<br>
>>> 1000,<br>
>>> "compute.ram_mb":<br>
>>> 327680,<br>
>>> "volume.disk_gb":<br>
>>> 10000,<br>
>>> "database.count":<br>
>>> 250<br>
>>> }<br>
>>> }<br>
>>><br>
>>> 2) Have Delimiter<br>
>>> communicate with the<br>
>>> above proposed new<br>
>>> Keystone<br>
>>> REST API and package<br>
>>> up data into an<br>
>>> oslo.versioned_objects interface.<br>
>>><br>
>>> Clearly all of the<br>
>>> above can be heavily<br>
>>> cached both on the<br>
>>> server and<br>
>>> client side since<br>
>>> they rarely change<br>
>>> but are read often.<br>
>>><br>
>>><br>
>>> [amrith] Caching on the<br>
>>> client won't save you from<br>
>>> oversubscription if<br>
>>><br>
>>><br>
>>><br>
>>> you don't run serializable.<br>
>>><br>
>>><br>
>>> The Delimiter<br>
>>> library could be<br>
>>> used to provide a<br>
>>> calling interface<br>
>>> for service projects<br>
>>> to get a user's<br>
>>> limits for a set of<br>
>>> resource<br>
>>><br>
>>><br>
>>> classes:<br>
>>><br>
>>><br>
>>> (please excuse<br>
>>> wrongness, typos,<br>
>>> and other stuff<br>
>>> below, it's just a<br>
>>> straw- man not<br>
>>> production working<br>
>>> code...)<br>
>>><br>
>>> # file:<br>
>>> delimiter/objects/limits.py<br>
>>> import<br>
>>> oslo.versioned_objects.base as ovo import<br>
>>> oslo.versioned_objects.fields as ovo_fields<br>
>>><br>
>>><br>
>>> class<br>
>>> ResourceLimit(ovo.VersionedObjectBase):<br>
>>> # 1.0: Initial<br>
>>> version<br>
>>> VERSION = '1.0'<br>
>>><br>
>>> fields = {<br>
>>> 'resource_class':<br>
>>> ovo_fields.StringField(),<br>
>>> 'amount':<br>
>>> ovo_fields.IntegerField(),<br>
>>> }<br>
>>><br>
>>><br>
>>> class<br>
>>> ResourceLimitList(ovo.VersionedObjectBase):<br>
>>> # 1.0: Initial<br>
>>> version<br>
>>> VERSION = '1.0'<br>
>>><br>
>>> fields = {<br>
>>> 'resources':<br>
>>> ListOfObjectsField(ResourceLimit),<br>
>>> }<br>
>>><br>
>>> @cache_this_heavily<br>
>>> @remotable_classmethod<br>
>>> def<br>
>>> get_all_by_user(cls,<br>
>>> user_uuid):<br>
>>> """Returns a Limits<br>
>>> object that tells<br>
>>> the caller what a<br>
>>> user's<br>
>>> absolute limits for<br>
>>> the set of resource<br>
>>> classes in the<br>
>>> system.<br>
>>> """<br>
>>> # Grab a keystone<br>
>>> client session<br>
>>> object and connect<br>
>>> to Keystone<br>
>>> ks =<br>
>>> ksclient.Session(...)<br>
>>> raw_limits =<br>
>>> ksclient.get_limits_by_user()<br>
>>> return<br>
>>> cls(resources=[ResourceLimit(**d) for d in raw_limits])<br>
>>><br>
>>> 3) Each service<br>
>>> project would be<br>
>>> responsible for<br>
>>> handling the<br>
>>> consumption of a set<br>
>>> of requested<br>
>>> resource amounts in<br>
>>> an atomic and<br>
>>><br>
>>><br>
>>> consistent way.<br>
>>><br>
>>><br>
>>> [amrith] This is where the<br>
>>> rubber meets the road. What<br>
>>> is that atomic<br>
>>><br>
>>><br>
>>><br>
>>> and consistent way? And what<br>
>>> computing infrastructure do you need<br>
>>> to<br>
>>> deliver this?<br>
>>><br>
>>> The Delimiter<br>
>>> library would return<br>
>>> the limits that the<br>
>>> service would<br>
>>> pre- check before<br>
>>> claiming the<br>
>>> resources and either<br>
>>> post-check after<br>
>>> claim or utilize a<br>
>>> compare-and-update<br>
>>> technique with a<br>
>>> generation/timestamp<br>
>>> during claiming to<br>
>>> prevent race<br>
>>> conditions.<br>
>>><br>
>>> For instance, in<br>
>>> Nova with the new<br>
>>> resource providers<br>
>>> database schema<br>
>>> and doing claims in<br>
>>> the scheduler (a<br>
>>> proposed change), we<br>
>>> might do<br>
>>> something to the<br>
>>> effect of:<br>
>>><br>
>>> from delimiter<br>
>>> import objects as<br>
>>> delim_obj from<br>
>>> delimier import<br>
>>> exceptions as<br>
>>> delim_exc from nova<br>
>>> import objects as<br>
>>> nova_obj<br>
>>><br>
>>> request =<br>
>>> nova_obj.RequestSpec.get_by_uuid(request_uuid)<br>
>>> requested =<br>
>>> request.resources<br>
>>> limits =<br>
>>> delim_obj.ResourceLimitList.get_all_by_user(user_uuid)<br>
>>> allocations =<br>
>>> nova_obj.AllocationList.get_all_by_user(user_uuid)<br>
>>><br>
>>> # Pre-check for<br>
>>> violations<br>
>>> for resource_class,<br>
>>> requested_amount in<br>
>>> requested.items():<br>
>>> limit_idx =<br>
>>> limits.resources.index(resource_class)<br>
>>> resource_limit =<br>
>>> limits.resources[limit_idx].amount<br>
>>> alloc_idx =<br>
>>> allocations.resources.index(resource_class)<br>
>>> resource_used =<br>
>>> allocations.resources[alloc_idx]<br>
>>> if (resource_used +<br>
>>> requested_amount)><br>
>>> resource_limit:<br>
>>> raise<br>
>>> delim_exc.QuotaExceeded<br>
>>><br>
>>><br>
>>> [amrith] Is the above code<br>
>>> run with some global mutex<br>
>>> to prevent that<br>
>>><br>
>>><br>
>>><br>
>>> two people don't believe that they<br>
>>> are good on quota at the same time?<br>
>>><br>
>>><br>
>>> # Do claims in<br>
>>> scheduler in an<br>
>>> atomic, consistent<br>
>>> fashion...<br>
>>> claims =<br>
>>> scheduler_client.claim_resources(request)<br>
>>><br>
>>><br>
>>> [amrith] Yes, each 'atomic'<br>
>>> claim on a repeatable-read<br>
>>> database could<br>
>>><br>
>>><br>
>>><br>
>>> result in oversubscription.<br>
>>><br>
>>><br>
>>> # Post-check for<br>
>>> violations<br>
>>> allocations =<br>
>>> nova_obj.AllocationList.get_all_by_user(user_uuid)<br>
>>> # allocations now<br>
>>> include the claimed<br>
>>> resources from the<br>
>>> scheduler<br>
>>><br>
>>> for resource_class,<br>
>>> requested_amount in<br>
>>> requested.items():<br>
>>> limit_idx =<br>
>>> limits.resources.index(resource_class)<br>
>>> resource_limit =<br>
>>> limits.resources[limit_idx].amount<br>
>>> alloc_idx =<br>
>>> allocations.resources.index(resource_class)<br>
>>> resource_used =<br>
>>> allocations.resources[alloc_idx]<br>
>>> if resource_used><br>
>>> resource_limit:<br>
>>> # Delete the<br>
>>> allocation records<br>
>>> for the resources<br>
>>> just claimed<br>
>>> delete_resources(claims)<br>
>>> raise<br>
>>> delim_exc.QuotaExceeded<br>
>>><br>
>>><br>
>>> [amrith] Again, two people<br>
>>> could drive through this<br>
>>> code and both of<br>
>>> them could fail :(<br>
>>><br>
>>> 4) The only other<br>
>>> thing that would<br>
>>> need to be done for<br>
>>> a first go of<br>
>>> the Delimiter<br>
>>> library is some<br>
>>> event listener that<br>
>>> can listen for<br>
>>> changes to the quota<br>
>>> limits for a<br>
>>> user/tenant/default<br>
>>> in Keystone.<br>
>>> We'd want the<br>
>>> services to be able<br>
>>> notify someone if a<br>
>>> reduction in<br>
>>> quota results in an<br>
>>> overquota situation.<br>
>>><br>
>>> Anyway, that's my<br>
>>> idea. Keep the<br>
>>> Delimiter library<br>
>>> small and focused<br>
>>> on describing the<br>
>>> limits only, not on<br>
>>> the resource<br>
>>> allocations. Have<br>
>>> the Delimiter<br>
>>> library present a<br>
>>> versioned object<br>
>>> interface so the<br>
>>> interaction between<br>
>>> the data exposed by<br>
>>> the Keystone REST<br>
>>> API for<br>
>>> quotas can evolve<br>
>>> naturally and<br>
>>> smoothly over time.<br>
>>><br>
>>> Best,<br>
>>> -jay<br>
>>><br>
>>> Let me<br>
>>> illustrate<br>
>>> the way I<br>
>>> see these<br>
>>> things<br>
>>> fitting<br>
>>> together. A<br>
>>> hypothetical<br>
>>> Trove system<br>
>>> may be setup<br>
>>> as follows:<br>
>>><br>
>>> - No more<br>
>>> than 2000<br>
>>> database<br>
>>> instances in<br>
>>> total, 300<br>
>>> clusters<br>
>>><br>
>>><br>
>>> in<br>
>>><br>
>>><br>
>>><br>
>>> total<br>
>>> - Users may<br>
>>> not launch<br>
>>> more than 25<br>
>>> database<br>
>>> instances,<br>
>>> or 4<br>
>>> clusters<br>
>>> - The<br>
>>> particular<br>
>>> user<br>
>>> 'amrith' is<br>
>>> limited to 2<br>
>>> databases<br>
>>> and<br>
>>><br>
>>><br>
>>> 1<br>
>>><br>
>>><br>
>>><br>
>>> cluster<br>
>>> - No user<br>
>>> may consume<br>
>>> more than<br>
>>> 20TB of<br>
>>> storage at a<br>
>>> time<br>
>>> - No user<br>
>>> may consume<br>
>>> more than<br>
>>> 10GB of<br>
>>> memory at a<br>
>>> time<br>
>>><br>
>>> At startup,<br>
>>> I believe<br>
>>> that the<br>
>>> system would<br>
>>> make the<br>
>>> following<br>
>>> sequence of<br>
>>> calls:<br>
>>><br>
>>> -<br>
>>> define_resource(databaseInstance, 2000, 0, 25, 0, ...)<br>
>>> -<br>
>>> update_resource_limits(databaseInstance, amrith, 2, 0,<br>
>>><br>
>>><br>
>>> ...)<br>
>>><br>
>>><br>
>>> -<br>
>>> define_resource(databaseCluster, 300, 0, 4, 0, ...)<br>
>>> -<br>
>>> update_resource_limits(databaseCluster, amrith, 1, 0, ...)<br>
>>> -<br>
>>> define_resource(storage, -1, 0, 20TB, 0, ...)<br>
>>> -<br>
>>> define_resource(memory, -1, 0, 10GB, 0, ...)<br>
>>><br>
>>> Assume that<br>
>>> the user<br>
>>> john comes<br>
>>> along and<br>
>>> asks for a<br>
>>> cluster with<br>
>>> 4<br>
>>> nodes, 1TB<br>
>>> storage per<br>
>>> node and<br>
>>> each node<br>
>>> having 1GB<br>
>>> of memory,<br>
>>> the<br>
>>> system would<br>
>>> go through<br>
>>> the<br>
>>> following<br>
>>> sequence:<br>
>>><br>
>>> -<br>
>>> reserve_resource(databaseCluster, john, 1, None)<br>
>>> o this<br>
>>> returns a<br>
>>> resourceID<br>
>>> (say<br>
>>> cluster-resource-<br>
>>><br>
>>><br>
>>> ID)<br>
>>><br>
>>><br>
>>><br>
>>> o the<br>
>>> cluster<br>
>>> instance<br>
>>> that it<br>
>>> reserves<br>
>>> counts<br>
>>><br>
>>><br>
>>><br>
>>> against<br>
>>><br>
>>><br>
>>><br>
>>> the<br>
>>> limit of 300<br>
>>> cluster<br>
>>> instances in<br>
>>> total, as<br>
>>> well<br>
>>><br>
>>><br>
>>><br>
>>> as<br>
>>><br>
>>><br>
>>><br>
>>> the 4<br>
>>> clusters<br>
>>> that john<br>
>>> can<br>
>>> provision.<br>
>>> If 'amrith'<br>
>>><br>
>>><br>
>>><br>
>>> had<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> requested<br>
>>> it, that<br>
>>> would have<br>
>>> been counted<br>
>>> against<br>
>>><br>
>>><br>
>>><br>
>>> the<br>
>>><br>
>>><br>
>>><br>
>>> limit<br>
>>> of 2<br>
>>> clusters for<br>
>>> the user.<br>
>>><br>
>>> -<br>
>>> reserve_resource(databaseInstance, john, 1,<br>
>>> cluster-resource-id)<br>
>>> -<br>
>>> reserve_resource(databaseInstance, john, 1,<br>
>>> cluster-resource-id)<br>
>>> -<br>
>>> reserve_resource(databaseInstance, john, 1,<br>
>>> cluster-resource-id)<br>
>>> -<br>
>>> reserve_resource(databaseInstance, john, 1,<br>
>>> cluster-resource-id)<br>
>>> o this<br>
>>> returns four<br>
>>> resource<br>
>>> id's, let's<br>
>>> say<br>
>>> instance-1-id, instance-2-id, instance-3-id,<br>
>>> instance-4-id<br>
>>> o note that<br>
>>> each<br>
>>> instance is<br>
>>> that, an<br>
>>> instance by<br>
>>> itself. it<br>
>>> is therefore<br>
>>> not right to<br>
>>> consider<br>
>>> this<br>
>>><br>
>>><br>
>>> as<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> equivalent<br>
>>> to a call to<br>
>>> reserve_resource() with a<br>
>>><br>
>>><br>
>>><br>
>>> size<br>
>>><br>
>>><br>
>>><br>
>>> of 4,<br>
>>> especially<br>
>>> because each<br>
>>> instance<br>
>>> could later<br>
>>><br>
>>><br>
>>><br>
>>> be<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> tracked as<br>
>>> an<br>
>>> individual<br>
>>> Nova<br>
>>> instance.<br>
>>><br>
>>> -<br>
>>> reserve_resource(storage, john, 1TB, instance-1-id)<br>
>>> -<br>
>>> reserve_resource(storage, john, 1TB, instance-2-id)<br>
>>> -<br>
>>> reserve_resource(storage, john, 1TB, instance-3-id)<br>
>>> -<br>
>>> reserve_resource(storage, john, 1TB, instance-4-id)<br>
>>><br>
>>> o each of<br>
>>> them returns<br>
>>> some<br>
>>> resourceID,<br>
>>> let's say<br>
>>><br>
>>><br>
>>> they<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> returned<br>
>>> cinder-1-id,<br>
>>> cinder-2-id,<br>
>>> cinder-3-id,<br>
>>> cinder-4-id<br>
>>> o since the<br>
>>> storage of<br>
>>> 1TB is a<br>
>>> unit, it is<br>
>>> treated<br>
>>><br>
>>><br>
>>> as<br>
>>><br>
>>><br>
>>><br>
>>> such.<br>
>>> In other<br>
>>> words, you<br>
>>> don't need<br>
>>> to invoke<br>
>>> reserve_resource 10^12 times, once per byte<br>
>>> allocated<br>
>>> :)<br>
>>><br>
>>> -<br>
>>> reserve_resource(memory, john, 1GB, instance-1-id)<br>
>>> -<br>
>>> reserve_resource(memory, john, 1GB, instance-2-id)<br>
>>> -<br>
>>> reserve_resource(memory, john, 1GB, instance-3-id)<br>
>>> -<br>
>>> reserve_resource(memory, john, 1GB, instance-4-id)<br>
>>> o each of<br>
>>> these return<br>
>>> something,<br>
>>> say<br>
>>> Dg4KBQcODAENBQEGBAcEDA, CgMJAg8FBQ8GDwgLBA8FAg,<br>
>>> BAQJBwYMDwAIAA0DBAkNAg, AQMLDA4OAgEBCQ0MBAMGCA. I<br>
>>><br>
>>><br>
>>> have<br>
>>><br>
>>><br>
>>><br>
>>> made<br>
>>> up arbitrary<br>
>>> strings just<br>
>>> to highlight<br>
>>> that we<br>
>>> really don't<br>
>>> track these<br>
>>> anywhere so<br>
>>> we don't<br>
>>> care<br>
>>><br>
>>><br>
>>> about<br>
>>><br>
>>><br>
>>><br>
>>> them.<br>
>>><br>
>>> If all this<br>
>>> works, then<br>
>>> the system<br>
>>> knows that<br>
>>> John's<br>
>>> request does<br>
>>> not violate<br>
>>> any quotas<br>
>>> that it can<br>
>>> enforce, it<br>
>>> can then go<br>
>>> ahead and<br>
>>> launch the<br>
>>> instances<br>
>>> (calling<br>
>>> Nova),<br>
>>> provision<br>
>>> storage, and<br>
>>> so on.<br>
>>><br>
>>> The system<br>
>>> then goes<br>
>>> and creates<br>
>>> four Cinder<br>
>>> volumes,<br>
>>> these are<br>
>>> cinder-1-uuid, cinder-2-uuid, cinder-3-uuid, cinder-4-uuid.<br>
>>><br>
>>> It can then<br>
>>> go and<br>
>>> confirm<br>
>>> those<br>
>>> reservations.<br>
>>><br>
>>> -<br>
>>> provision_resource(cinder-1-id, cinder-1-uuid)<br>
>>> -<br>
>>> provision_resource(cinder-2-id, cinder-2-uuid)<br>
>>> -<br>
>>> provision_resource(cinder-3-id, cinder-3-uuid)<br>
>>> -<br>
>>> provision_resource(cinder-4-id, cinder-4-uuid)<br>
>>><br>
>>> It could<br>
>>> then go and<br>
>>> launch 4<br>
>>> nova<br>
>>> instances<br>
>>> and<br>
>>> similarly<br>
>>> provision<br>
>>> those<br>
>>> resources,<br>
>>> and so on.<br>
>>> This process<br>
>>> could take<br>
>>> some minutes<br>
>>> and<br>
>>> holding a<br>
>>> database<br>
>>> transaction<br>
>>> open for<br>
>>> this is the<br>
>>> issue that<br>
>>> Jay<br>
>>> brings up in<br>
>>> [4]. We<br>
>>> don't have<br>
>>> to in this<br>
>>> proposed<br>
>>> scheme.<br>
>>><br>
>>> Since the<br>
>>> resources<br>
>>> are all<br>
>>> hierarchically linked through the<br>
>>> overall<br>
>>> cluster id,<br>
>>> when the<br>
>>> cluster is<br>
>>> setup, it<br>
>>> can finally<br>
>>> go and<br>
>>> provision<br>
>>> that:<br>
>>><br>
>>> -<br>
>>> provision_resource(cluster-resource-id, cluster-uuid)<br>
>>><br>
>>> When Trove<br>
>>> is done with<br>
>>> some<br>
>>> individual<br>
>>> resource, it<br>
>>> can go and<br>
>>> release it.<br>
>>> Note that<br>
>>> I'm thinking<br>
>>> this will<br>
>>> invoke<br>
>>> release_resource<br>
>>> with the ID<br>
>>> of the<br>
>>> underlying<br>
>>> object OR<br>
>>> the<br>
>>> resource.<br>
>>><br>
>>> -<br>
>>> release_resource(cinder-4-id), and<br>
>>> -<br>
>>> release_resource(cinder-4-uuid)<br>
>>><br>
>>> are<br>
>>> therefore<br>
>>> identical<br>
>>> and indicate<br>
>>> that the 4th<br>
>>> 1TB volume<br>
>>> is now<br>
>>> released.<br>
>>> How this<br>
>>> will be<br>
>>> implemented<br>
>>> in Python,<br>
>>> kwargs or<br>
>>> some<br>
>>> other<br>
>>> mechanism<br>
>>> is, I<br>
>>> believe, an<br>
>>> implementation detail.<br>
>>><br>
>>> Finally, it<br>
>>> releases the<br>
>>> cluster<br>
>>> resource by<br>
>>> doing this:<br>
>>><br>
>>> -<br>
>>> release_resource(cluster-resource-id)<br>
>>><br>
>>> This would<br>
>>> release the<br>
>>> cluster and<br>
>>> all<br>
>>> dependent<br>
>>> resources in<br>
>>> a<br>
>>> single<br>
>>> operation.<br>
>>><br>
>>> A user may<br>
>>> wish to<br>
>>> manage a<br>
>>> resource<br>
>>> that was<br>
>>> provisioned<br>
>>> from the<br>
>>> service.<br>
>>> Assume that<br>
>>> this results<br>
>>> in a<br>
>>> resizing of<br>
>>> the<br>
>>> instances,<br>
>>> then it is a<br>
>>> matter of<br>
>>> updating<br>
>>> that<br>
>>> resource.<br>
>>><br>
>>> Assume that<br>
>>> the third<br>
>>> 1TB volume<br>
>>> is being<br>
>>> resized to<br>
>>> 2TB, then it<br>
>>> is<br>
>>> merely a<br>
>>> matter of<br>
>>> invoking:<br>
>>><br>
>>> -<br>
>>> update_resource(cinder-3-uuid, 2TB)<br>
>>><br>
>>> Delimiter<br>
>>> can go<br>
>>> figure out<br>
>>> that<br>
>>> cinder-3-uuid is a 1TB device and<br>
>>> therefore<br>
>>> this is an<br>
>>> increase of<br>
>>> 1TB and<br>
>>> verify that<br>
>>> this is<br>
>>> within<br>
>>> the quotas<br>
>>> allowed for<br>
>>> the user.<br>
>>><br>
>>> The thing<br>
>>> that I find<br>
>>> attractive<br>
>>> about this<br>
>>> model of<br>
>>> maintaining<br>
>>> a<br>
>>> hierarchy of<br>
>>> reservations<br>
>>> is that in<br>
>>> the event of<br>
>>> an error,<br>
>>> the<br>
>>> service need<br>
>>> merely call<br>
>>> release_resource() on the highest level<br>
>>> reservation<br>
>>> and the<br>
>>> Delimiter<br>
>>> project can<br>
>>> walk down<br>
>>> the chain<br>
>>> and<br>
>>> release all<br>
>>> the<br>
>>> resources or<br>
>>> reservations<br>
>>> as<br>
>>> appropriate.<br>
>>><br>
>>> Under the<br>
>>> covers I<br>
>>> believe that<br>
>>> each of<br>
>>> these<br>
>>> operations<br>
>>> should be<br>
>>> atomic and<br>
>>> may update<br>
>>> multiple<br>
>>> database<br>
>>> tables but<br>
>>> these will<br>
>>> all be<br>
>>> short lived<br>
>>> operations.<br>
>>><br>
>>> For example,<br>
>>> reserving an<br>
>>> instance<br>
>>> resource<br>
>>> would<br>
>>> increment<br>
>>> the<br>
>>> number of<br>
>>> instances<br>
>>> for the user<br>
>>> as well as<br>
>>> the number<br>
>>> of instances<br>
>>> on the<br>
>>> whole, and<br>
>>> this would<br>
>>> be an atomic<br>
>>> operation.<br>
>>><br>
>>> I have two<br>
>>> primary<br>
>>> areas of<br>
>>> concern<br>
>>> about the<br>
>>> proposal<br>
>>> [3].<br>
>>><br>
>>> The first is<br>
>>> that it<br>
>>> makes the<br>
>>> implicit<br>
>>> assumption<br>
>>> that the<br>
>>> "flat mode"<br>
>>> is<br>
>>> implemented.<br>
>>> That<br>
>>> provides<br>
>>> value to a<br>
>>><br>
>>><br>
>>> consumer<br>
>>><br>
>>><br>
>>><br>
>>> but I think<br>
>>> it leaves a<br>
>>> lot for the<br>
>>> consumer to<br>
>>> do. For<br>
>>><br>
>>><br>
>>><br>
>>> example,<br>
>>><br>
>>><br>
>>> I<br>
>>> find it hard<br>
>>> to see how<br>
>>> the model<br>
>>> proposed<br>
>>> would handle<br>
>>><br>
>>><br>
>>><br>
>>> the<br>
>>><br>
>>><br>
>>><br>
>>> release of<br>
>>> quotas,<br>
>>> leave alone<br>
>>> the case of<br>
>>> a nested<br>
>>> release of<br>
>>><br>
>>><br>
>>> a<br>
>>><br>
>>><br>
>>><br>
>>> hierarchy<br>
>>> of<br>
>>> resources.<br>
>>><br>
>>> The other is<br>
>>> the notion<br>
>>> that the<br>
>>> implementation will begin a<br>
>>> transaction,<br>
>>> perform a<br>
>>> query(),<br>
>>> make some<br>
>>> manipulations, and<br>
>>> then do a<br>
>>> save(). This<br>
>>> makes for an<br>
>>> interesting<br>
>>> transaction<br>
>>> management<br>
>>> challenge as<br>
>>> it would<br>
>>> require the<br>
>>> underlying<br>
>>><br>
>>><br>
>>> database<br>
>>><br>
>>><br>
>>><br>
>>> to run in<br>
>>> an isolation<br>
>>> mode of at<br>
>>> least<br>
>>> repeatable<br>
>>> reads and<br>
>>> maybe even<br>
>>> serializable<br>
>>> which would<br>
>>> be a<br>
>>> performance<br>
>>> bear on<br>
>>><br>
>>><br>
>>> a<br>
>>><br>
>>><br>
>>><br>
>>> heavily<br>
>>> loaded<br>
>>> system. If<br>
>>> run in the<br>
>>> traditional<br>
>>> read-<br>
>>><br>
>>><br>
>>><br>
>>> committed<br>
>>><br>
>>><br>
>>><br>
>>> mode, this<br>
>>> would<br>
>>> silently<br>
>>> lead to over<br>
>>> subscriptions, and<br>
>>><br>
>>><br>
>>><br>
>>> the<br>
>>><br>
>>><br>
>>><br>
>>> violation<br>
>>> of quota<br>
>>> limits.<br>
>>><br>
>>> I believe<br>
>>> that it<br>
>>> should be a<br>
>>> requirement<br>
>>> that the<br>
>>> Delimiter<br>
>>> library<br>
>>> should be<br>
>>> able to run<br>
>>> against a<br>
>>> database<br>
>>> that<br>
>>> supports,<br>
>>> and is<br>
>>> configured<br>
>>> for<br>
>>> READ-COMMITTED, and should not require anything higher.<br>
>>> The model<br>
>>> proposed<br>
>>> above can<br>
>>> certainly be<br>
>>> implemented<br>
>>> with a<br>
>>> database<br>
>>> running<br>
>>> READ-COMMITTED, and I believe that this is also<br>
>>> true with<br>
>>> the caveat<br>
>>> that the<br>
>>> operations<br>
>>> will be<br>
>>> performed<br>
>>> through<br>
>>><br>
>>><br>
>>> SQLAlchemy.<br>
>>><br>
>>><br>
>>> Thanks,<br>
>>><br>
>>> -amrith<br>
>>><br>
>>> [1]<br>
>>> <a href="http://openstack.markmail.org/thread/tkl2jcyvzgifniux">
http://openstack.markmail.org/thread/tkl2jcyvzgifniux</a><br>
>>> [2]<br>
>>> <a href="http://openstack.markmail.org/thread/3cr7hoeqjmgyle2j">
http://openstack.markmail.org/thread/3cr7hoeqjmgyle2j</a><br>
>>> [3]<br>
>>> <a href="https://review.openstack.org/#/c/284454/">
https://review.openstack.org/#/c/284454/</a><br>
>>> [4]<br>
>>> <a href="http://markmail.org/message/7ixvezcsj3uyiro6">
http://markmail.org/message/7ixvezcsj3uyiro6</a><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> ____________________________________________________________________<br>
>>> __ ____<br>
>>> OpenStack<br>
>>> Development<br>
>>> Mailing List<br>
>>> (not for<br>
>>> usage<br>
>>> questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>> _____________________________________________________________________<br>
>>> _____ OpenStack<br>
>>> Development Mailing<br>
>>> List (not for usage<br>
>>> questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>> ______________________________________________________________________<br>
>>> ____ OpenStack Development<br>
>>> Mailing List (not for usage<br>
>>> questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>> __________________________________________________________________________<br>
>>> OpenStack Development Mailing List<br>
>>> (not for usage questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>><br>
>>> __________________________________________________________________________<br>
>>> OpenStack Development Mailing List (not for<br>
>>> usage questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>><br>
>>> __________________________________________________________________________<br>
>>> OpenStack Development Mailing List (not for usage<br>
>>> questions)<br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> __________________________________________________________________________<br>
>>><br>
>>> OpenStack Development Mailing List (not for usage questions)<br>
>>><br>
>>> Unsubscribe:<br>
>>> OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>>><br>
>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
>>><br>
>><br>
>> __________________________________________________________________________<br>
>> OpenStack Development Mailing List (not for usage questions)<br>
>> Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe<br>
>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
</div>
</span></font>
</body>
</html>