Open Stack

Fri Jun 9 23:26:16 UTC 2017

On Fri, 9 Jun 2017, Jay Pipes wrote:

> Sorry, been in a three-hour meeting. Comments inline...

Thanks for getting to this, it's very helpful to me.

>> * Part of the reason for having nested resource providers is because
>>   it can allow affinity/anti-affinity below the compute node (e.g.,
>>   workloads on the same host but different numa cells).
>
> Mmm, kinda, yeah.

What I meant by this was that if it didn't matter which of more than
one nested rp was used, then it would be easier to simply consider
the group of them as members of an inventory (that came out a bit
more in one of the later questions).

>> * Does a claim made in the scheduler need to be complete? Is there
>>   value in making a partial claim from the scheduler that consumes a
>>   vcpu and some ram, and then in the resource tracker is corrected
>>   to consume a specific pci device, numa cell, gpu and/or fpga?
>>   Would this be better or worse than what we have now? Why?
>
> Good question. I think the answer to this is probably pretty theoretical at 
> this point. My gut instinct is that we should treat the consumption of 
> resources in an atomic fashion, and that transactional nature of allocation 
> will result in fewer race conditions and cleaner code. But, admittedly, this 
> is just my gut reaction.

I suppose if we were more spread oriented than pack oriented, an
allocation of vcpu and ram would almost operate as a proxy for a
lock, allowing the later correcting allocation proposed above to be
somewhat safe because other near concurrent emplacements would be
happening on some other machine. But we don't have that reality.
I've always been in favor of making the allocation as early as
possible. I remember those halcyon days when we even thought it
might be possible to make a request and claim of resources in one
HTTP request.

>>   that makes it difficult or impossible for an allocation against a
>>   parent provider to be able to determine the correct child
>>   providers to which to cascade some of the allocation? (And by
>>   extension make the earlier scheduling decision.)
>
> See above. The sorting/weighing logic, which is very much deployer-defined 
> and wreaks of customization, is what would need to be added to the placement 
> API.

And enough of that sorting/weighing logic is likely to do with child or
shared providers that it's not possible to constrain the weighing
and sorting to solely compute nodes? Not just whether the host is on
fire, but the share disk farm too?

Okay, thank you, that helps set the stage more clearly and leads
straight to my remaining big question, which is asked on the spec
you've proposed:

     https://review.openstack.org/#/c/471927/

What are big strokes mechanisms for connecting the non-allocation
data in the response to GET /allocation_requests to the sorting
weighing logic? Answering on the spec works fine for me, I'm just
repeating it here in case people following along want the transition
over to the spec.

Thanks again.

-- 
Chris Dent                  ┬──┬◡ﾉ(° -°ﾉ)       https://anticdent.org/
freenode: cdent                                         tw: @anticdent

Open Stack

[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

OpenStack

Community

Documentation

Branding & Legal