[openstack-dev] [cyborg] [nova] Cyborg quotas
Jay Pipes
jaypipes at gmail.com
Sat May 19 13:30:35 UTC 2018
On 05/18/2018 07:58 AM, Nadathur, Sundar wrote:
> Agreed. Not sure how other projects handle it, but here's the situation
> for Cyborg. A request may get scheduled on a compute node with no
> intervention by Cyborg. So, the earliest check that can be made today is
> in the selected compute node. A simple approach can result in quota
> violations as in this example.
>
> Say there are 5 devices in a cluster. A tenant has a quota of 4 and
> is currently using 3. That leaves 2 unused devices, of which the
> tenant is permitted to use only one. But he may submit two
> concurrent requests, and they may land on two different compute
> nodes. The Cyborg agent in each node will see the current tenant
> usage as 3 and let the request go through, resulting in quota violation.
>
> To prevent this, we need some kind of atomic update , like SQLAlchemy's
> with_lockmode():
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE
>
> That seems to have issues, as documented in the link above. Also, since
> every compute node does that, it would also serialize the bringup of all
> instances with accelerators, across the cluster.
>
> If there is a better solution, I'll be happy to hear it.
The solution is to implement the following two specs:
https://review.openstack.org/#/c/509042/
https://review.openstack.org/#/c/569011/
The problem of consuming more resources than a user/project has quota
for is not a new problem. Users have been able to go over their quota in
all of the services for as long as I can remember -- they can do this by
essentially DDoS'ing the API with lots of concurrent single-instance
build requests [1] all at once. The tenant then ends up in an over-quota
situation and is essentially unable to do anything at all before
deleting resources.
The only operators that I can remember that complained about this issue
were the public cloud operators -- and rightfully so since quota abuse
in public clouds meant their reputation for fairness might be
questioned. Most operators I know of solved this problem by addressing
*rate-limiting*, which is not the same as quota limits. By rate-limiting
requests to the APIs, the operators were able to alleviate the problem
by addressing a symptom, which was that high rates of concurrent
requests could lead to over-quota situations.
Nobody is using Cyborg separately from Nova at the moment (or ever?).
It's not as if a user will be consuming an accelerator outside of a Nova
instance -- since it is the Nova instance that is the workload that uses
the accelerator.
That means that Cyborg resources should be treated as just another
resource class whose usage should be checked in a single query to the
/usages placement API endpoint before attempting to spawn the instance
(again, via Nova) that ends up consuming those resources.
The claiming of all resources that are consumed by a Nova instance
(which would include any accelerator resources) is an atomic operation
that prevents over-allocation of any provider involved in the claim
transaction. [2]
This atomic operation in Nova/Placement *significantly* cuts down on the
chances of a user/project exceeding its quota because it reduces the
amount of time to get an accurate read of the resource usage to a very
small amount of time (from seconds/tens of seconds to milliseconds).
So, to sum up, my recommendation is to get involved in the two Nova
specs above and help to see them to completion in Rocky. Doing so will
free Cyborg developers up to focus on integration with the virt driver
layer via the os-acc library, implementing the update_provider_tree()
interface, and coming up with some standard resource classes for
describing accelerated resources.
Best,
-jay
[1] I'm explicitly calling out multiple concurrent single build requests
here, since a build request for multiple instances is actually not a
cause of over-quota because the entire set of requested instances is
considered as a single unit for usage calculation.
[2] technically, NUMA topology resources and PCI devices do not
currently participate in this single claim transaction. This is not
ideal, and is something we are actively working on addressing. Keep in
mind there are also no quota classes for PCI devices or NUMA topologies,
though, so the over-quota problems don't exist for those resource classes.
More information about the OpenStack-dev
mailing list