[openstack-dev] [cyborg] [nova] Cyborg quotas

Sylvain Bauza sbauza at redhat.com
Fri May 18 12:06:51 UTC 2018


Le ven. 18 mai 2018 à 13:59, Nadathur, Sundar <sundar.nadathur at intel.com> a
écrit :

> Hi Matt,
> On 5/17/2018 3:18 PM, Matt Riedemann wrote:
>
> On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:
>
> This applies only to the resources that Nova handles, IIUC, which does not
> handle accelerators. The generic method that Alex talks about is obviously
> preferable but, if that is not available in Rocky, is the filter an option?
>
>
> If nova isn't creating accelerator resources managed by cyborg, I have no
> idea why nova would be doing quota checks on those types of resources. And
> no, I don't think adding a scheduler filter to nova for checking
> accelerator quota is something we'd add either. I'm not sure that would
> even make sense - the quota for the resource is per tenant, not per host is
> it? The scheduler filters work on a per-host basis.
>
> Can we not extend BaseFilter.filter_all() to get all the hosts in a
> filter?
>
> https://github.com/openstack/nova/blob/master/nova/filters.py#L36
>
> I should have made it clearer that this putative filter will be
> out-of-tree, and needed only till better solutions become available.
>

No, there are two clear parameters for a filter, and changing that would
mean a new paradigm for FilterScheduler.
If you need to have a check for all the hosts, maybe it should be either a
pre-filter for Placement or a post-filter but we don't accept out of tree
yet.


> Like any other resource in openstack, the project that manages that
> resource should be in charge of enforcing quota limits for it.
>
> Agreed. Not sure how other projects handle it, but here's the situation
> for Cyborg. A request may get scheduled on a compute node with no
> intervention by Cyborg. So, the earliest check that can be made today is in
> the selected compute node. A simple approach can result in quota violations
> as in this example.
>
> Say there are 5 devices in a cluster. A tenant has a quota of 4 and is
> currently using 3. That leaves 2 unused devices, of which the tenant is
> permitted to use only one. But he may submit two concurrent requests, and
> they may land on two different compute nodes. The Cyborg agent in each node
> will see the current tenant usage as 3 and let the request go through,
> resulting in quota violation.
>
> To prevent this, we need some kind of atomic update , like SQLAlchemy's
> with_lockmode():
>
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE
> That seems to have issues, as documented in the link above. Also, since
> every compute node does that, it would also serialize the bringup of all
> instances with accelerators, across the cluster.
>
> If there is a better solution, I'll be happy to hear it.
>
> Thanks,
> Sundar
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180518/9f32d598/attachment.html>


More information about the OpenStack-dev mailing list