[openstack-dev] [cyborg] [nova] Cyborg quotas

Jay Pipes jaypipes at gmail.com
Sat May 19 13:30:35 UTC 2018


On 05/18/2018 07:58 AM, Nadathur, Sundar wrote:
> Agreed. Not sure how other projects handle it, but here's the situation 
> for Cyborg. A request may get scheduled on a compute node with no 
> intervention by Cyborg. So, the earliest check that can be made today is 
> in the selected compute node. A simple approach can result in quota 
> violations as in this example.
> 
>     Say there are 5 devices in a cluster. A tenant has a quota of 4 and
>     is currently using 3. That leaves 2 unused devices, of which the
>     tenant is permitted to use only one. But he may submit two
>     concurrent requests, and they may land on two different compute
>     nodes. The Cyborg agent in each node will see the current tenant
>     usage as 3 and let the request go through, resulting in quota violation.
 >
> To prevent this, we need some kind of atomic update , like SQLAlchemy's 
> with_lockmode():
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE 
> 
> That seems to have issues, as documented in the link above. Also, since 
> every compute node does that, it would also serialize the bringup of all 
> instances with accelerators, across the cluster.
 >
> If there is a better solution, I'll be happy to hear it.

The solution is to implement the following two specs:

https://review.openstack.org/#/c/509042/
https://review.openstack.org/#/c/569011/

The problem of consuming more resources than a user/project has quota 
for is not a new problem. Users have been able to go over their quota in 
all of the services for as long as I can remember -- they can do this by 
essentially DDoS'ing the API with lots of concurrent single-instance 
build requests [1] all at once. The tenant then ends up in an over-quota 
situation and is essentially unable to do anything at all before 
deleting resources.

The only operators that I can remember that complained about this issue 
were the public cloud operators -- and rightfully so since quota abuse 
in public clouds meant their reputation for fairness might be 
questioned. Most operators I know of solved this problem by addressing 
*rate-limiting*, which is not the same as quota limits. By rate-limiting 
requests to the APIs, the operators were able to alleviate the problem 
by addressing a symptom, which was that high rates of concurrent 
requests could lead to over-quota situations.

Nobody is using Cyborg separately from Nova at the moment (or ever?). 
It's not as if a user will be consuming an accelerator outside of a Nova 
instance -- since it is the Nova instance that is the workload that uses 
the accelerator.

That means that Cyborg resources should be treated as just another 
resource class whose usage should be checked in a single query to the 
/usages placement API endpoint before attempting to spawn the instance 
(again, via Nova) that ends up consuming those resources.

The claiming of all resources that are consumed by a Nova instance 
(which would include any accelerator resources) is an atomic operation 
that prevents over-allocation of any provider involved in the claim 
transaction. [2]

This atomic operation in Nova/Placement *significantly* cuts down on the 
chances of a user/project exceeding its quota because it reduces the 
amount of time to get an accurate read of the resource usage to a very 
small amount of time (from seconds/tens of seconds to milliseconds).

So, to sum up, my recommendation is to get involved in the two Nova 
specs above and help to see them to completion in Rocky. Doing so will 
free Cyborg developers up to focus on integration with the virt driver 
layer via the os-acc library, implementing the update_provider_tree() 
interface, and coming up with some standard resource classes for 
describing accelerated resources.

Best,
-jay

[1] I'm explicitly calling out multiple concurrent single build requests 
here, since a build request for multiple instances is actually not a 
cause of over-quota because the entire set of requested instances is 
considered as a single unit for usage calculation.

[2] technically, NUMA topology resources and PCI devices do not 
currently participate in this single claim transaction. This is not 
ideal, and is something we are actively working on addressing. Keep in 
mind there are also no quota classes for PCI devices or NUMA topologies, 
though, so the over-quota problems don't exist for those resource classes.



More information about the OpenStack-dev mailing list