[openstack-dev] [cross-project] [all] Quotas -- service vs. library
Jay Pipes
jaypipes at gmail.com
Sun Mar 27 19:51:23 UTC 2016
On 03/16/2016 07:10 AM, Attila Fazekas wrote:
> NO : For any kind of extra quota service.
>
> In other places I saw other reasons for a quota service or similar,
> the actual cost of this approach is higher than most people would think so NO.
>
> Maybe Library,
> But I do not want to see for example the bad pattern used in nova to spread everywhere.
>
> The quota usage handling MUST happen in the same DB transaction as the
> resource record (volume, server..) create/update/delete .
No, it doesn't.
This isn't even a remotely realistic thing for a distributed system like
Nova to try. There often isn't a single DB transaction that creates or
updates or deletes something in Nova. API calls typically traverse 3 or
more separate compute and controller node services, saving state
transitions to a resource record sometimes *dozens* of times over the
course of the API call -- many of which are entirely asynchronous.
In addition to the already-distributed, mostly-asynchronous nature of
Nova, keep in mind that a Nova deployment can have multiple cells each
having its own entirely different database, multiple availability zones
each with multiple cells, and multiple regions each with multiple
availability zones.
You still want to keep a database transaction (two phase commit or
otherwise) open for the duration of events that cross any of those
boundaries?
No, I think not.
> There is no need for.:
> - reservation-expirer services or periodic tasks ..
> - there is no need for quota usage correcter shell scripts or whatever
> - multiple commits
>
> We have a transaction capable DB, to help us,
> not using it would be lame.
We *do* use transactions in the quota system. That's not the primary
problem with the Nova quota system.
The major problem with the Nova quota system is that it keeps a cache of
usage records in its quota_usages (and reservations) table instead of
querying the primary tables that *actually* store real resources.
Any time you introduce cache tables, you introduce greater potential for
races to occur.
Either you accept the inevitability of those race conditions, or you
relax the constraints that you put on the locking system, and/or you get
rid of the caching and accept some small degradation of raw performance
of quota checks.
In my opinion, we should have a system that issues a single quota check
against the actual resource records. [1] This check should occur right
up front before any operation that would consume a resource. [2]
I don't actually think that this proposed quota thing should be service.
Why? Because a separate service would inevitably be implemented by
creating cache tables about quota usages. And that's at the root of the
problem with race conditions in existing quota solutions in OpenStack.
Better to create a library that actually has no database of its own at
all. Instead, have the library expose a simple and consistent API that
relies on an implementation plugin that would live in each OpenStack
component that needed quota checks. These plugins would query the
*actual* resource tables that exist in their own databases to determine
if a user's request can be satisfied.
We'd still need some central-ish place to *update* quota amounts for
individual users/tenants/defaults. IMHO, Keystone is the logical place
to put this kind of information. No need for a new service when we
already have one that can serve this purpose. A previous suggestion of
just using the auth token to either store the quotas themselves or a
link to grab quota information for a user is a good suggestion.
Finally, someone had mentioned rate limiting in an earlier response. I
don't believe OpenStack should be in the business of rate limiting.
There are tested and scalable solutions that do rate limiting for HTTP
requests. [3] Those should be used instead of Python code in a WSGI
application or middleware service.
My two cents,
-jay
[1] In Nova, this would be the new allocations table in the
resource-providers modeling or the instances table in the legacy modeling
[2] If there's no cache table of quota usages, there's NO reason why
quota calculations need to be made on ANY request that doesn't actually
consume a new resource. This means no quota calculations on DELETE
operations, no quota calculations on most UPDATE operations (since the
amount of consumed resources generally doesn't change at all)
[3]
https://lincolnloop.com/blog/rate-limiting-nginx/
http://www.openrepose.org/
https://tyk.io/
More information about the OpenStack-dev
mailing list