[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

Jay Pipes jaypipes at gmail.com
Mon Jun 12 15:20:47 UTC 2017


On 06/09/2017 06:31 PM, Ed Leafe wrote:
> On Jun 9, 2017, at 4:35 PM, Jay Pipes <jaypipes at gmail.com> wrote:
> 
>>> We can declare that allocating for shared disk is fairly deterministic
>>> if we assume that any given compute node is only associated with one
>>> shared disk provider.
>>
>> a) we can't assume that
>> b) a compute node could very well have both local disk and shared disk. how would the placement API know which one to pick? This is a sorting/weighing decision and thus is something the scheduler is responsible for.
> 
> I remember having this discussion, and we concluded that a compute node could either have local or shared resources, but not both. There would be a trait to indicate shared disk. Has this changed?

I'm not sure it's changed per-se :) It's just that there's nothing 
preventing this from happening. A compute node can theoretically have 
local disk and also be associated with a shared storage pool.

>>> * We already have the information the filter scheduler needs now by
>>>   some other means, right?  What are the reasons we don't want to
>>>   use that anymore?
>>
>> The filter scheduler has most of the information, yes. What it doesn't have is the *identifier* (UUID) for things like SRIOV PFs or NUMA cells that the Placement API will use to distinguish between things. In other words, the filter scheduler currently does things like unpack a NUMATopology object into memory and determine a NUMA cell to place an instance to. However, it has no concept that that NUMA cell is (or will soon be once nested-resource-providers is done) a resource provider in the placement API. Same for SRIOV PFs. Same for VGPUs. Same for FPGAs, etc. That's why we need to return information to the scheduler from the placement API that will allow the scheduler to understand "hey, this NUMA cell on compute node X is resource provider $UUID".
> 
> I guess that this was the point that confused me. The RP uuid is part of the provider: the compute node's uuid, and (after https://review.openstack.org/#/c/469147/ merges) the PCI device's uuid. So in the code that passes the PCI device information to the scheduler, we could add that new uuid field, and then the scheduler would have the information to a) select the best fit and then b) claim it with the specific uuid. Same for all the other nested/shared devices.

How would the scheduler know that a particular SRIOV PF resource 
provider UUID is on a particular compute node unless the placement API 
returns information indicating that SRIOV PF is a child of a particular 
compute node resource provider?

> I don't mean to belabor this, but to my mind this seems a lot less disruptive to the existing code.

Belabor away :) I don't mind talking through the details. It's important 
to do.

Best,
-jay



More information about the OpenStack-dev mailing list