[openstack-dev] [cross-project] [all] Quotas -- service vs. library

Jay Pipes jaypipes at gmail.com
Sun Mar 27 19:51:23 UTC 2016


On 03/16/2016 07:10 AM, Attila Fazekas wrote:
> NO : For any kind of extra quota service.
>
> In other places I saw other reasons for a quota service or similar,
>   the actual cost of this approach is higher than most people would think so NO.
>
> Maybe Library,
> But I do not want to see for example the bad pattern used in nova to spread everywhere.
>
> The quota usage handling MUST happen in the same DB transaction as the
> resource record (volume, server..) create/update/delete  .

No, it doesn't.

This isn't even a remotely realistic thing for a distributed system like 
Nova to try. There often isn't a single DB transaction that creates or 
updates or deletes something in Nova. API calls typically traverse 3 or 
more separate compute and controller node services, saving state 
transitions to a resource record sometimes *dozens* of times over the 
course of the API call -- many of which are entirely asynchronous.

In addition to the already-distributed, mostly-asynchronous nature of 
Nova, keep in mind that a Nova deployment can have multiple cells each 
having its own entirely different database, multiple availability zones 
each with multiple cells, and multiple regions each with multiple 
availability zones.

You still want to keep a database transaction (two phase commit or 
otherwise) open for the duration of events that cross any of those 
boundaries?

No, I think not.

> There is no need for.:
> - reservation-expirer services or periodic tasks ..
> - there is no need for quota usage correcter shell scripts or whatever
> - multiple commits
>
> We have a transaction capable DB, to help us,
> not using it would be lame.

We *do* use transactions in the quota system. That's not the primary 
problem with the Nova quota system.

The major problem with the Nova quota system is that it keeps a cache of 
usage records in its quota_usages (and reservations) table instead of 
querying the primary tables that *actually* store real resources.

Any time you introduce cache tables, you introduce greater potential for 
races to occur.

Either you accept the inevitability of those race conditions, or you 
relax the constraints that you put on the locking system, and/or you get 
rid of the caching and accept some small degradation of raw performance 
of quota checks.

In my opinion, we should have a system that issues a single quota check 
against the actual resource records. [1] This check should occur right 
up front before any operation that would consume a resource. [2]

I don't actually think that this proposed quota thing should be service. 
Why? Because a separate service would inevitably be implemented by 
creating cache tables about quota usages. And that's at the root of the 
problem with race conditions in existing quota solutions in OpenStack.

Better to create a library that actually has no database of its own at 
all. Instead, have the library expose a simple and consistent API that 
relies on an implementation plugin that would live in each OpenStack 
component that needed quota checks. These plugins would query the 
*actual* resource tables that exist in their own databases to determine 
if a user's request can be satisfied.

We'd still need some central-ish place to *update* quota amounts for 
individual users/tenants/defaults. IMHO, Keystone is the logical place 
to put this kind of information. No need for a new service when we 
already have one that can serve this purpose. A previous suggestion of 
just using the auth token to either store the quotas themselves or a 
link to grab quota information for a user is a good suggestion.

Finally, someone had mentioned rate limiting in an earlier response. I 
don't believe OpenStack should be in the business of rate limiting. 
There are tested and scalable solutions that do rate limiting for HTTP 
requests. [3] Those should be used instead of Python code in a WSGI 
application or middleware service.

My two cents,
-jay

[1] In Nova, this would be the new allocations table in the 
resource-providers modeling or the instances table in the legacy modeling

[2] If there's no cache table of quota usages, there's NO reason why 
quota calculations need to be made on ANY request that doesn't actually 
consume a new resource. This means no quota calculations on DELETE 
operations, no quota calculations on most UPDATE operations (since the 
amount of consumed resources generally doesn't change at all)

[3]
https://lincolnloop.com/blog/rate-limiting-nginx/
http://www.openrepose.org/
https://tyk.io/



More information about the OpenStack-dev mailing list