[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

Chris Dent cdent+os at anticdent.org
Tue Jun 6 14:56:42 UTC 2017

On Mon, 5 Jun 2017, Ed Leafe wrote:

> One proposal is to essentially use the same logic in placement
> that was used to include that host in those matching the
> requirements. In other words, when it tries to allocate the amount
> of disk, it would determine that that host is in a shared storage
> aggregate, and be smart enough to allocate against that provider.
> This was referred to in our discussion as "Plan A".

What would help for me is greater explanation of if and if so, how and
why, "Plan A" doesn't work for nested resource providers.

We can declare that allocating for shared disk is fairly deterministic
if we assume that any given compute node is only associated with one
shared disk provider.

My understanding is this determinism is not the case with nested
resource providers because there's some fairly late in the game
choosing of which pci device or which numa cell is getting used.
The existing resource tracking doesn't have this problem because the
claim of those resources is made very late in the game. < Is this

The problem comes into play when we want to claim from the scheduler
(or conductor). Additional information is required to choose which
child providers to use. <- Is this correct?

Plan B overcomes the information deficit by including more
information in the response from placement (as straw-manned in the
etherpad [1]) allowing code in the filter scheduler to make accurate
claims. <- Is this correct?

For clarity and completeness in the discussion some questions for
which we have explicit answers would be useful. Some of these may
appear ignorant or obtuse and are mostly things we've been over
before. The goal is to draw out some clear statements in the present
day to be sure we are all talking about the same thing (or get us
there if not) modified for what we know now, compared to what we
knew a week or month ago.

* We already have the information the filter scheduler needs now by
   some other means, right?  What are the reasons we don't want to
   use that anymore?

* Part of the reason for having nested resource providers is because
   it can allow affinity/anti-affinity below the compute node (e.g.,
   workloads on the same host but different numa cells). If I
   remember correctly, the modelling and tracking of this kind of
   information in this way comes out of the time when we imagined the
   placement service would be doing considerably more filtering than
   is planned now. Plan B appears to be an acknowledgement of "on
   some of this stuff, we can't actually do anything but provide you
   some info, you need to decide". If that's the case, is the
   topological modelling on the placement DB side of things solely a
   convenient place to store information? If there were some other
   way to model that topology could things currently being considered
   for modelling as nested providers be instead simply modelled as
   inventories of a particular class of resource?
   (I'm not suggesting we do this, rather that the answer that says
   why we don't want to do this is useful for understanding the

* Does a claim made in the scheduler need to be complete? Is there
   value in making a partial claim from the scheduler that consumes a
   vcpu and some ram, and then in the resource tracker is corrected
   to consume a specific pci device, numa cell, gpu and/or fpga?
   Would this be better or worse than what we have now? Why?

* What is lacking in placement's representation of resource providers
   that makes it difficult or impossible for an allocation against a
   parent provider to be able to determine the correct child
   providers to which to cascade some of the allocation? (And by
   extension make the earlier scheduling decision.)

That's a start. With answers to at last some of these questions I
think the straw man in the etherpad can be more effectively
evaluated. As things stand right now it is a proposed solution
without a clear problem statement. I feel like we could do with a
more clear problem statement.


[1] https://etherpad.openstack.org/p/placement-allocations-straw-man

Chris Dent                  ┬──┬◡ノ(° -°ノ)       https://anticdent.org/
freenode: cdent                                         tw: @anticdent

More information about the OpenStack-dev mailing list