[openstack-dev] [scheduler] [heat] Policy specifics

Zane Bitter zbitter at redhat.com
Mon Sep 30 10:33:32 UTC 2013


On 27/09/13 17:58, Clint Byrum wrote:
> Excerpts from Zane Bitter's message of 2013-09-27 06:58:40 -0700:
>> On 27/09/13 08:58, Mike Spreitzer wrote:
>>> I have begun to draft some specifics about the sorts of policies that
>>> might be added to infrastructure to inform a smart unified placement
>>> engine.  These are cast as an extension to Heat templates.  See
>>> https://wiki.openstack.org/wiki/Heat/PolicyExtension.  Comments solicited.
>>
>> Mike,
>> These are not the kinds of specifics that are of any help at all in
>> figuring out how (or, indeed, whether) to incorporate holistic
>> scheduling into OpenStack.
>
> I agree that the things in that page are a wet dream of logical deployment
> fun. However, I think one can target just a few of the basic ones,
> and see a real achievable case forming. I think I grasp Mike's ideas,
> so I'll respond to your concerns with what I think. Note that it is
> highly likely I've gotten some of this wrong.

Thanks for having a crack at this Clint. However, I think your example 
is not apposite, because it doesn't actually require any holistic 
scheduling. You can easily do anti-colocation of a bunch of servers just 
using scheduler hints to the Nova API (stick one in each zone until you 
run out of zones). This just requires Heat to expose the scheduler hints 
portion of the Nova API. To my mind this stuff is so basic that it falls 
squarely in the category of what you said in a previous thread:

> There is
> definitely a need for Heat to be able to communicate to the API's any
> placement details that can be communicated. However, Heat should not
> actually be "scheduling" anything.

But in any event, most of your answers appear to be predicated on this 
very simple case, not on a holistic scheduler. I think you are vastly 
underestimating the complexity of the problem.

What Mike is proposing is something more sophisticated, whereby you can 
solve for the optimal scheduling of resources of different types across 
different APIs. There may be a case for including this in Heat, but it 
needs to be made, and IMO it needs to be made by answering these kinds 
of questions at a similar level of detail to the symmetric dyadic 
primitives wiki page.

BTW there is one more question I should add:

- Who will implement and maintain this service/feature, and the 
associated changes to existing services?

>> - What would a holistic scheduling service look like? A standalone
>> service? Part of heat-engine?
>
> I see it as a preprocessor of sorts for the current infrastructure engine.
> It would take the logical expression of the cluster and either turn
> it into actual deployment instructions or respond to the user that it
> cannot succeed. Ideally it would just extend the same Heat API.
>
>> - How will the scheduling service reserve slots for resources in advance
>> of them being created? How will those reservations be accounted for and
>> billed?
>> - In the event that slots are reserved but those reservations are not
>> taken up, what will happen?
>
> I dont' see the word "reserve" in Mike's proposal, and I don't think this
> is necessary for the more basic models like Collocation and Anti-Collocation.

Right, but we're not talking about only the basic models. Reservations 
are very much needed according to my understanding of the proposal, 
because the whole point is to co-ordinate across multiple services in a 
way that is impossible to do atomically.

> Reservations would of course make the scheduling decisions more likely to
> succeed, but it isn't necessary if we do things optimistically. If the
> stack create or update fails, we can retry with better parameters.
>
>> - Once scheduled, how will resources be created in their proper slots as
>> part of a Heat template?
>
> In goes a Heat template (sorry for not using HOT.. still learning it. ;)
>
> Resources:
>    ServerTemplate:
>      Type: Some::Defined::ProviderType
>    HAThing1:
>      Type: OS::Heat::HACluster
>      Properties:
>        ClusterSize: 3
>        MaxPerAZ: 1
>        PlacementStrategy: anti-collocation
>        Resources: [ ServerTemplate ]
>
> And if we have at least 2 AZ's available, it feeds to the heat engine:
>
> Resources:
>    HAThing1-0:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          availability-zone: zone-A
>    HAThing1-1:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          availability-zone: zone-B
>    HAThing1-2:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          availability-zone: zone-A
>
> If not, holistic scheduler says back "I don't have enough AZ's to
> satisfy MaxPerAZ".
>
> Now, if Nova grows anti-affininty under the covers that it can manage
> directly, a later version can just spit out:
>
> Resources:
>    HAThing1-0:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          instance-group: 0
>          affinity-type: anti
>    HAThing1-1:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          instance-group: 1
>          affinity-type: anti
>    HAThing1-2:
>      Type: Some::Defined::ProviderType
>        Parameters:
>          instance-group: 0
>          affinity-type: anti
>
> The point is that the user cares about their servers not being in the
> same failure domain, not how that happens.
>
>> - What about when the user calls the APIs directly? (i.e. does their own
>> orchestration - either hand-rolled or using their own standalone Heat.)
>
> This has come up with autoscaling too. "Undefined" - that's not your stack.

Well, when we have the new autoscaling service you'll still be able to 
create an autoscaling group using your own standalone Heat engine. If 
the provider has a scheduling service, why shouldn't you be able to use 
that with your own standalone Heat engine too?

>> - How and from where will the scheduling service obtain the utilisation
>> data needed to perform the scheduling? What mechanism will segregate
>> this information from the end user?
>
> I do think this is a big missing piece. Right now it is spread out
> all over the place. Keystone at least has regions, so that could be
> incorporated now. I briefly dug through the other API's and don't see
> a way to enumerate AZ's or cells. Perhaps it is hiding in extensions?
>
> I don't think this must be segregated from end users. An API for "show
> me the placement decisions I can make" seems useful for anybody trying
> to automate deployments. Anyway, probably best to keep it decentralized
> and just make it so that each service can respond with lists of arguments
> to their API that are likely to succeed.

I think you're thinking about the very simplest case still (e.g. list of 
AZs - we have that already). To implement a completely general 
scheduling service you're going to need data down to the level of e.g. 
which machines are overcommitted and by how much. Good luck convincing 
public cloud providers to make this available through a user-facing API. 
The unintended consequences only _begin_ with pathological user 
behaviour, and end somewhere in the realm of lawsuits, financial 
reporting and competitive analysis.

As Mike pointed out downthread, the scheduler primarily serves the cloud 
provider's interest. That means the raw input data is at best (when 
compared to the actual scheduler output) a record of exactly how much 
the provider does or does not care about users, and at worst a basis for 
users building their own scheduler that serves only their own interest.

So the scheduler service needs some privileged access to the internals 
of each service. Heat is unprivileged (it just calls public APIs - you 
can run your own locally). How to resolve that mismatch is a key 
question if scheduling is to become part of Heat.

cheers,
Zane.



More information about the OpenStack-dev mailing list