[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources
Edward Leafe
ed at leafe.com
Thu Jun 8 14:57:35 UTC 2017
Sorry for the top-post, but it seems that nobody has responded to this, and there are a lot of important questions that need answers. So I’m simply re-posting this so that we don’t get too ahead of ourselves, by planning implementations before we fully understand the problem and the implications of any proposed solution.
-- Ed Leafe
> On Jun 6, 2017, at 9:56 AM, Chris Dent <cdent+os at anticdent.org> wrote:
>
> On Mon, 5 Jun 2017, Ed Leafe wrote:
>
>> One proposal is to essentially use the same logic in placement
>> that was used to include that host in those matching the
>> requirements. In other words, when it tries to allocate the amount
>> of disk, it would determine that that host is in a shared storage
>> aggregate, and be smart enough to allocate against that provider.
>> This was referred to in our discussion as "Plan A".
>
> What would help for me is greater explanation of if and if so, how and
> why, "Plan A" doesn't work for nested resource providers.
>
> We can declare that allocating for shared disk is fairly deterministic
> if we assume that any given compute node is only associated with one
> shared disk provider.
>
> My understanding is this determinism is not the case with nested
> resource providers because there's some fairly late in the game
> choosing of which pci device or which numa cell is getting used.
> The existing resource tracking doesn't have this problem because the
> claim of those resources is made very late in the game. < Is this
> correct?
>
> The problem comes into play when we want to claim from the scheduler
> (or conductor). Additional information is required to choose which
> child providers to use. <- Is this correct?
>
> Plan B overcomes the information deficit by including more
> information in the response from placement (as straw-manned in the
> etherpad [1]) allowing code in the filter scheduler to make accurate
> claims. <- Is this correct?
>
> For clarity and completeness in the discussion some questions for
> which we have explicit answers would be useful. Some of these may
> appear ignorant or obtuse and are mostly things we've been over
> before. The goal is to draw out some clear statements in the present
> day to be sure we are all talking about the same thing (or get us
> there if not) modified for what we know now, compared to what we
> knew a week or month ago.
>
> * We already have the information the filter scheduler needs now by
> some other means, right? What are the reasons we don't want to
> use that anymore?
>
> * Part of the reason for having nested resource providers is because
> it can allow affinity/anti-affinity below the compute node (e.g.,
> workloads on the same host but different numa cells). If I
> remember correctly, the modelling and tracking of this kind of
> information in this way comes out of the time when we imagined the
> placement service would be doing considerably more filtering than
> is planned now. Plan B appears to be an acknowledgement of "on
> some of this stuff, we can't actually do anything but provide you
> some info, you need to decide". If that's the case, is the
> topological modelling on the placement DB side of things solely a
> convenient place to store information? If there were some other
> way to model that topology could things currently being considered
> for modelling as nested providers be instead simply modelled as
> inventories of a particular class of resource?
> (I'm not suggesting we do this, rather that the answer that says
> why we don't want to do this is useful for understanding the
> picture.)
>
> * Does a claim made in the scheduler need to be complete? Is there
> value in making a partial claim from the scheduler that consumes a
> vcpu and some ram, and then in the resource tracker is corrected
> to consume a specific pci device, numa cell, gpu and/or fpga?
> Would this be better or worse than what we have now? Why?
>
> * What is lacking in placement's representation of resource providers
> that makes it difficult or impossible for an allocation against a
> parent provider to be able to determine the correct child
> providers to which to cascade some of the allocation? (And by
> extension make the earlier scheduling decision.)
>
> That's a start. With answers to at last some of these questions I
> think the straw man in the etherpad can be more effectively
> evaluated. As things stand right now it is a proposed solution
> without a clear problem statement. I feel like we could do with a
> more clear problem statement.
>
> Thanks.
>
> [1] https://etherpad.openstack.org/p/placement-allocations-straw-man
>
> --
> Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/
> freenode: cdent tw: @anticdent__________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list