[openstack-dev] [nova][scheduler][placement] Allocating Complex Resources
Jay Pipes
jaypipes at gmail.com
Fri Jun 9 14:56:58 UTC 2017
On 06/05/2017 05:22 PM, Ed Leafe wrote:
> Another proposal involved a change to how placement responds to the
> scheduler. Instead of just returning the UUIDs of the compute nodes
> that satisfy the required resources, it would include a whole bunch
> of additional information in a structured response. A straw man
> example of such a response is here:
> https://etherpad.openstack.org/p/placement-allocations-straw-man.
> This was referred to as "Plan B".
Actually, this was Plan "C". Plan "B" was to modify the return of the
GET /resource_providers Placement REST API endpoint.
> The main feature of this approach
> is that part of that response would be the JSON dict for the
> allocation call, containing the specific resource provider UUID for
> each resource. This way, when the scheduler selects a host
Important clarification is needed here. The proposal is to have the
scheduler actually select *more than just the compute host*. The
scheduler would select the host, any sharing providers and any child
providers within a host that actually contained the resources/traits
that the request demanded.
>, it would
> simply pass that dict back to the /allocations call, and placement
> would be able to do the allocations directly against that
> information.
>
> There was another issue raised: simply providing the host UUIDs
> didn't give the scheduler enough information in order to run its
> filters and weighers. Since the scheduler uses those UUIDs to
> construct HostState objects, the specific missing information was
> never completely clarified, so I'm just including this aspect of the
> conversation for completeness. It is orthogonal to the question of
> how to allocate when the resource provider is not "simple".
The specific missing information is the following, but not limited to:
* Whether or not a resource can be provided by a sharing provider or a
"local provider" or either. For example, assume a compute node that is
associated with a shared storage pool via an aggregate but that also has
local disk for instances. The Placement API currently returns just the
compute host UUID but no indication of whether the compute host has
local disk to consume from, has shared disk to consume from, or both.
The scheduler is the thing that must weigh these choices and make a
choice. The placement API gives the scheduler the choices and the
scheduler makes a decision based on sorting/weighing algorithms.
It is imperative to remember the reason *why* we decided (way back in
Portland at the Nova mid-cycle last year) to keep sorting/weighing in
the Nova scheduler. The reason is because operators (and some
developers) insisted on being able to weigh the possible choices in ways
that "could not be pre-determined". In other words, folks wanted to keep
the existing uber-flexibility and customizability that the scheduler
weighers (and home-grown weigher plugins) currently allow, including
being able to sort possible compute hosts by such things as the average
thermal temperature of the power supply the hardware was connected to
over the last five minutes (I kid you friggin not.)
* Which SR-IOV physical function should provider an SRIOV_NET_VF
resource to an instance. Imagine a situation where a compute host has 4
SR-IOV physical functions, each having some traits representing hardware
offload support and each having an inventory of 8 SRIOV_NET_VF.
Currently the scheduler absolutely has the information to pick one of
these SRIOV physical functions to assign to a workload. What the
scheduler does *not* have, however, is a way to tell the Placement API
to consume an SRIOV_NET_VF from that particular physical function. Why?
Because the scheduler doesn't know that a particular physical function
even *is* a resource provider in the placement API. *Something* needs to
inform the scheduler that the physical function is a resource provider
and has a particular UUID to identify it. This is precisely what the
proposed GET /allocation_requests HTTP response data provides to the
scheduler.
> My current feeling is that we got ourselves into our existing mess of
> ugly, convoluted code when we tried to add these complex
> relationships into the resource tracker and the scheduler. We set out
> to create the placement engine to bring some sanity back to how we
> think about things we need to virtualize.
Sorry, I completely disagree with your assessment of why the placement
engine exists. We didn't create it to bring some sanity back to how we
think about things we need to virtualize. We created it to add
consistency and structure to the representation of resources in the system.
I don't believe that exposing this structured representation of
resources is a bad thing or that it is leaking "implementation details"
out of the placement API. It's not an implementation detail that a
resource provider is a child of another or that a different resource
provider is supplying some resource to a group of other providers.
That's simply an accurate representation of the underlying data structures.
> I would really hate to see
> us make the same mistake again, by adding a good deal of complexity
> to handle a few non-simple cases. What I would like to avoid, no
> matter what the eventual solution chosen, is representing this
> complexity in multiple places. Currently the only two candidates for
> this logic are the placement engine, which knows about these
> relationships already, or the compute service itself, which has to
> handle the management of these complex virtualized resources.
The compute service will need to know about the hierarchies of providers
on a particular compute node. That isn't complexity. It's simply
accurate representation of the underlying data structures. Instead of
random dicts of key/value pairs and different serialized JSON blobs for
each particular class of resources, we now have a single, consistent way
of describing the providers of those resources.
> I don't know the answer. I'm hoping that we can have a discussion
> that might uncover a clear approach, or, at the very least, one that
> is less murky than the others.
I really like Dan's idea of returning a list of HTTP request bodies for
POST /allocations/{consumer_uuid} calls along with a list of provider
information that the scheduler can use in its sorting/weighing algorithms.
We've put this straw-man proposal here:
https://review.openstack.org/#/c/471927/
I'm hoping to keep the conversation going there.
Best,
-jay
More information about the OpenStack-dev
mailing list