[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources
Chris Dent
cdent+os at anticdent.org
Tue Jun 6 14:56:42 UTC 2017
On Mon, 5 Jun 2017, Ed Leafe wrote:
> One proposal is to essentially use the same logic in placement
> that was used to include that host in those matching the
> requirements. In other words, when it tries to allocate the amount
> of disk, it would determine that that host is in a shared storage
> aggregate, and be smart enough to allocate against that provider.
> This was referred to in our discussion as "Plan A".
What would help for me is greater explanation of if and if so, how and
why, "Plan A" doesn't work for nested resource providers.
We can declare that allocating for shared disk is fairly deterministic
if we assume that any given compute node is only associated with one
shared disk provider.
My understanding is this determinism is not the case with nested
resource providers because there's some fairly late in the game
choosing of which pci device or which numa cell is getting used.
The existing resource tracking doesn't have this problem because the
claim of those resources is made very late in the game. < Is this
correct?
The problem comes into play when we want to claim from the scheduler
(or conductor). Additional information is required to choose which
child providers to use. <- Is this correct?
Plan B overcomes the information deficit by including more
information in the response from placement (as straw-manned in the
etherpad [1]) allowing code in the filter scheduler to make accurate
claims. <- Is this correct?
For clarity and completeness in the discussion some questions for
which we have explicit answers would be useful. Some of these may
appear ignorant or obtuse and are mostly things we've been over
before. The goal is to draw out some clear statements in the present
day to be sure we are all talking about the same thing (or get us
there if not) modified for what we know now, compared to what we
knew a week or month ago.
* We already have the information the filter scheduler needs now by
some other means, right? What are the reasons we don't want to
use that anymore?
* Part of the reason for having nested resource providers is because
it can allow affinity/anti-affinity below the compute node (e.g.,
workloads on the same host but different numa cells). If I
remember correctly, the modelling and tracking of this kind of
information in this way comes out of the time when we imagined the
placement service would be doing considerably more filtering than
is planned now. Plan B appears to be an acknowledgement of "on
some of this stuff, we can't actually do anything but provide you
some info, you need to decide". If that's the case, is the
topological modelling on the placement DB side of things solely a
convenient place to store information? If there were some other
way to model that topology could things currently being considered
for modelling as nested providers be instead simply modelled as
inventories of a particular class of resource?
(I'm not suggesting we do this, rather that the answer that says
why we don't want to do this is useful for understanding the
picture.)
* Does a claim made in the scheduler need to be complete? Is there
value in making a partial claim from the scheduler that consumes a
vcpu and some ram, and then in the resource tracker is corrected
to consume a specific pci device, numa cell, gpu and/or fpga?
Would this be better or worse than what we have now? Why?
* What is lacking in placement's representation of resource providers
that makes it difficult or impossible for an allocation against a
parent provider to be able to determine the correct child
providers to which to cascade some of the allocation? (And by
extension make the earlier scheduling decision.)
That's a start. With answers to at last some of these questions I
think the straw man in the etherpad can be more effectively
evaluated. As things stand right now it is a proposed solution
without a clear problem statement. I feel like we could do with a
more clear problem statement.
Thanks.
[1] https://etherpad.openstack.org/p/placement-allocations-straw-man
--
Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/
freenode: cdent tw: @anticdent
More information about the OpenStack-dev
mailing list