Open Stack

Fri Jun 9 14:56:58 UTC 2017

On 06/05/2017 05:22 PM, Ed Leafe wrote:
> Another proposal involved a change to how placement responds to the
> scheduler. Instead of just returning the UUIDs of the compute nodes
> that satisfy the required resources, it would include a whole bunch
> of additional information in a structured response. A straw man
> example of such a response is here:
> https://etherpad.openstack.org/p/placement-allocations-straw-man.
> This was referred to as "Plan B".

Actually, this was Plan "C". Plan "B" was to modify the return of the 
GET /resource_providers Placement REST API endpoint.

 > The main feature of this approach
> is that part of that response would be the JSON dict for the
> allocation call, containing the specific resource provider UUID for
> each resource. This way, when the scheduler selects a host

Important clarification is needed here. The proposal is to have the 
scheduler actually select *more than just the compute host*. The 
scheduler would select the host, any sharing providers and any child 
providers within a host that actually contained the resources/traits 
that the request demanded.

 >, it would
> simply pass that dict back to the /allocations call, and placement
> would be able to do the allocations directly against that
> information.
>
> There was another issue raised: simply providing the host UUIDs
> didn't give the scheduler enough information in order to run its
> filters and weighers. Since the scheduler uses those UUIDs to
> construct HostState objects, the specific missing information was
> never completely clarified, so I'm just including this aspect of the
> conversation for completeness. It is orthogonal to the question of
> how to allocate when the resource provider is not "simple".

The specific missing information is the following, but not limited to:

* Whether or not a resource can be provided by a sharing provider or a 
"local provider" or either. For example, assume a compute node that is 
associated with a shared storage pool via an aggregate but that also has 
local disk for instances. The Placement API currently returns just the 
compute host UUID but no indication of whether the compute host has 
local disk to consume from, has shared disk to consume from, or both. 
The scheduler is the thing that must weigh these choices and make a 
choice. The placement API gives the scheduler the choices and the 
scheduler makes a decision based on sorting/weighing algorithms.

It is imperative to remember the reason *why* we decided (way back in 
Portland at the Nova mid-cycle last year) to keep sorting/weighing in 
the Nova scheduler. The reason is because operators (and some 
developers) insisted on being able to weigh the possible choices in ways 
that "could not be pre-determined". In other words, folks wanted to keep 
the existing uber-flexibility and customizability that the scheduler 
weighers (and home-grown weigher plugins) currently allow, including 
being able to sort possible compute hosts by such things as the average 
thermal temperature of the power supply the hardware was connected to 
over the last five minutes (I kid you friggin not.)

* Which SR-IOV physical function should provider an SRIOV_NET_VF 
resource to an instance. Imagine a situation where a compute host has 4 
SR-IOV physical functions, each having some traits representing hardware 
offload support and each having an inventory of 8 SRIOV_NET_VF. 
Currently the scheduler absolutely has the information to pick one of 
these SRIOV physical functions to assign to a workload. What the 
scheduler does *not* have, however, is a way to tell the Placement API 
to consume an SRIOV_NET_VF from that particular physical function. Why? 
Because the scheduler doesn't know that a particular physical function 
even *is* a resource provider in the placement API. *Something* needs to 
inform the scheduler that the physical function is a resource provider 
and has a particular UUID to identify it. This is precisely what the 
proposed GET /allocation_requests HTTP response data provides to the 
scheduler.

> My current feeling is that we got ourselves into our existing mess of
> ugly, convoluted code when we tried to add these complex
> relationships into the resource tracker and the scheduler. We set out
> to create the placement engine to bring some sanity back to how we
> think about things we need to virtualize.

Sorry, I completely disagree with your assessment of why the placement 
engine exists. We didn't create it to bring some sanity back to how we 
think about things we need to virtualize. We created it to add 
consistency and structure to the representation of resources in the system.

I don't believe that exposing this structured representation of 
resources is a bad thing or that it is leaking "implementation details" 
out of the placement API. It's not an implementation detail that a 
resource provider is a child of another or that a different resource 
provider is supplying some resource to a group of other providers. 
That's simply an accurate representation of the underlying data structures.

 > I would really hate to see
> us make the same mistake again, by adding a good deal of complexity
> to handle a few non-simple cases. What I would like to avoid, no
> matter what the eventual solution chosen, is representing this
> complexity in multiple places. Currently the only two candidates for
> this logic are the placement engine, which knows about these
> relationships already, or the compute service itself, which has to
> handle the management of these complex virtualized resources.

The compute service will need to know about the hierarchies of providers 
on a particular compute node. That isn't complexity. It's simply 
accurate representation of the underlying data structures. Instead of 
random dicts of key/value pairs and different serialized JSON blobs for 
each particular class of resources, we now have a single, consistent way 
of describing the providers of those resources.

> I don't know the answer. I'm hoping that we can have a discussion
> that might uncover a clear approach, or, at the very least, one that
> is less murky than the others.

I really like Dan's idea of returning a list of HTTP request bodies for 
POST /allocations/{consumer_uuid} calls along with a list of provider 
information that the scheduler can use in its sorting/weighing algorithms.

We've put this straw-man proposal here:

https://review.openstack.org/#/c/471927/

I'm hoping to keep the conversation going there.

Best,
-jay

Open Stack

[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

OpenStack

Community

Documentation

Branding & Legal