Open Stack

Sat May 19 10:34:05 UTC 2018

2018-05-18 19:58 GMT+08:00 Nadathur, Sundar <sundar.nadathur at intel.com>:

> Hi Matt,
> On 5/17/2018 3:18 PM, Matt Riedemann wrote:
>
> On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:
>
> This applies only to the resources that Nova handles, IIUC, which does not
> handle accelerators. The generic method that Alex talks about is obviously
> preferable but, if that is not available in Rocky, is the filter an option?
>
>
> If nova isn't creating accelerator resources managed by cyborg, I have no
> idea why nova would be doing quota checks on those types of resources. And
> no, I don't think adding a scheduler filter to nova for checking
> accelerator quota is something we'd add either. I'm not sure that would
> even make sense - the quota for the resource is per tenant, not per host is
> it? The scheduler filters work on a per-host basis.
>
> Can we not extend BaseFilter.filter_all() to get all the hosts in a
> filter?
>           https://github.com/openstack/nova/blob/master/nova/filters.
> py#L36
>
> I should have made it clearer that this putative filter will be
> out-of-tree, and needed only till better solutions become available.
>
>
> Like any other resource in openstack, the project that manages that
> resource should be in charge of enforcing quota limits for it.
>
> Agreed. Not sure how other projects handle it, but here's the situation
> for Cyborg. A request may get scheduled on a compute node with no
> intervention by Cyborg. So, the earliest check that can be made today is in
> the selected compute node. A simple approach can result in quota violations
> as in this example.
>
> Say there are 5 devices in a cluster. A tenant has a quota of 4 and is
> currently using 3. That leaves 2 unused devices, of which the tenant is
> permitted to use only one. But he may submit two concurrent requests, and
> they may land on two different compute nodes. The Cyborg agent in each node
> will see the current tenant usage as 3 and let the request go through,
> resulting in quota violation.
>
> That's a bed design if Cyborg agent in each node let the request go
through.
And the current Cyborg quota design does not have this issue.

> To prevent this, we need some kind of atomic update , like SQLAlchemy's
> with_lockmode():
>      https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#
> Pessimistic_Locking_-_SELECT_FOR_UPDATE
> That seems to have issues, as documented in the link above. Also, since
> every compute node does that, it would also serialize the bringup of all
> instances with accelerators, across the cluster.
>
> If there is a better solution, I'll be happy to hear it.
>
> Thanks,
> Sundar
>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180519/cbd03c6e/attachment.html>

Open Stack

[openstack-dev] [cyborg] [nova] Cyborg quotas

OpenStack

Community

Documentation

Branding & Legal