Open Stack

Thu Jun 8 14:57:35 UTC 2017

Sorry for the top-post, but it seems that nobody has responded to this, and there are a lot of important questions that need answers. So I’m simply re-posting this so that we don’t get too ahead of ourselves, by planning implementations before we fully understand the problem and the implications of any proposed solution.

-- Ed Leafe

> On Jun 6, 2017, at 9:56 AM, Chris Dent <cdent+os at anticdent.org> wrote:
> 
> On Mon, 5 Jun 2017, Ed Leafe wrote:
> 
>> One proposal is to essentially use the same logic in placement
>> that was used to include that host in those matching the
>> requirements. In other words, when it tries to allocate the amount
>> of disk, it would determine that that host is in a shared storage
>> aggregate, and be smart enough to allocate against that provider.
>> This was referred to in our discussion as "Plan A".
> 
> What would help for me is greater explanation of if and if so, how and
> why, "Plan A" doesn't work for nested resource providers.
> 
> We can declare that allocating for shared disk is fairly deterministic
> if we assume that any given compute node is only associated with one
> shared disk provider.
> 
> My understanding is this determinism is not the case with nested
> resource providers because there's some fairly late in the game
> choosing of which pci device or which numa cell is getting used.
> The existing resource tracking doesn't have this problem because the
> claim of those resources is made very late in the game. < Is this
> correct?
> 
> The problem comes into play when we want to claim from the scheduler
> (or conductor). Additional information is required to choose which
> child providers to use. <- Is this correct?
> 
> Plan B overcomes the information deficit by including more
> information in the response from placement (as straw-manned in the
> etherpad [1]) allowing code in the filter scheduler to make accurate
> claims. <- Is this correct?
> 
> For clarity and completeness in the discussion some questions for
> which we have explicit answers would be useful. Some of these may
> appear ignorant or obtuse and are mostly things we've been over
> before. The goal is to draw out some clear statements in the present
> day to be sure we are all talking about the same thing (or get us
> there if not) modified for what we know now, compared to what we
> knew a week or month ago.
> 
> * We already have the information the filter scheduler needs now by
>  some other means, right?  What are the reasons we don't want to
>  use that anymore?
> 
> * Part of the reason for having nested resource providers is because
>  it can allow affinity/anti-affinity below the compute node (e.g.,
>  workloads on the same host but different numa cells). If I
>  remember correctly, the modelling and tracking of this kind of
>  information in this way comes out of the time when we imagined the
>  placement service would be doing considerably more filtering than
>  is planned now. Plan B appears to be an acknowledgement of "on
>  some of this stuff, we can't actually do anything but provide you
>  some info, you need to decide". If that's the case, is the
>  topological modelling on the placement DB side of things solely a
>  convenient place to store information? If there were some other
>  way to model that topology could things currently being considered
>  for modelling as nested providers be instead simply modelled as
>  inventories of a particular class of resource?
>  (I'm not suggesting we do this, rather that the answer that says
>  why we don't want to do this is useful for understanding the
>  picture.)
> 
> * Does a claim made in the scheduler need to be complete? Is there
>  value in making a partial claim from the scheduler that consumes a
>  vcpu and some ram, and then in the resource tracker is corrected
>  to consume a specific pci device, numa cell, gpu and/or fpga?
>  Would this be better or worse than what we have now? Why?
> 
> * What is lacking in placement's representation of resource providers
>  that makes it difficult or impossible for an allocation against a
>  parent provider to be able to determine the correct child
>  providers to which to cascade some of the allocation? (And by
>  extension make the earlier scheduling decision.)
> 
> That's a start. With answers to at last some of these questions I
> think the straw man in the etherpad can be more effectively
> evaluated. As things stand right now it is a proposed solution
> without a clear problem statement. I feel like we could do with a
> more clear problem statement.
> 
> Thanks.
> 
> [1] https://etherpad.openstack.org/p/placement-allocations-straw-man
> 
> -- 
> Chris Dent                  ┬──┬◡ﾉ(° -°ﾉ)       https://anticdent.org/
> freenode: cdent                                         tw: @anticdent__________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

OpenStack

Community

Documentation

Branding & Legal