[openstack-dev] [scheduler] [heat] Policy specifics

Clint Byrum clint at fewbar.com
Mon Sep 30 15:33:06 UTC 2013


Excerpts from Zane Bitter's message of 2013-09-30 03:33:32 -0700:
> On 27/09/13 17:58, Clint Byrum wrote:
> > Excerpts from Zane Bitter's message of 2013-09-27 06:58:40 -0700:
> >> On 27/09/13 08:58, Mike Spreitzer wrote:
> >>> I have begun to draft some specifics about the sorts of policies that
> >>> might be added to infrastructure to inform a smart unified placement
> >>> engine.  These are cast as an extension to Heat templates.  See
> >>> https://wiki.openstack.org/wiki/Heat/PolicyExtension.  Comments solicited.
> >>
> >> Mike,
> >> These are not the kinds of specifics that are of any help at all in
> >> figuring out how (or, indeed, whether) to incorporate holistic
> >> scheduling into OpenStack.
> >
> > I agree that the things in that page are a wet dream of logical deployment
> > fun. However, I think one can target just a few of the basic ones,
> > and see a real achievable case forming. I think I grasp Mike's ideas,
> > so I'll respond to your concerns with what I think. Note that it is
> > highly likely I've gotten some of this wrong.
> 
> Thanks for having a crack at this Clint. However, I think your example 
> is not apposite, because it doesn't actually require any holistic 
> scheduling. You can easily do anti-colocation of a bunch of servers just 
> using scheduler hints to the Nova API (stick one in each zone until you 
> run out of zones). This just requires Heat to expose the scheduler hints 
> portion of the Nova API. To my mind this stuff is so basic that it falls 
> squarely in the category of what you said in a previous thread:
> 

The implementation of the basic case is quite simple, but still useful
to implement. The point is that by making it easier to write templates
that express this, the cloud provider makes it easier to load balance
between AZ's based on real load, not round robin by the user. A healthy
balancing across AZ's serves provider and user better. Could we do this
as a resource in Heat? Yes! But we could do all of it as a resource in
Heat, and I'm not entirely convinced that it is a good idea to take
that approach.  I don't want to get into the actual implementation,
but rather "what are the road blocks to a minimally useful solution."

> > There is
> > definitely a need for Heat to be able to communicate to the API's any
> > placement details that can be communicated. However, Heat should not
> > actually be "scheduling" anything.
> 
> But in any event, most of your answers appear to be predicated on this 
> very simple case, not on a holistic scheduler. I think you are vastly 
> underestimating the complexity of the problem.
> 

They're predicated on the simplicity of the discussion of that feature.
Reading through all of them, I see the same problem repeated over and
over. "Given input from the cloud provider and a logical expression of
the end goal from the user, produce a concrete plan". What Mike started
with was "That sounds like what Nova does, but there should be something
that coordinates it across all services."

> What Mike is proposing is something more sophisticated, whereby you can 
> solve for the optimal scheduling of resources of different types across 
> different APIs. There may be a case for including this in Heat, but it 
> needs to be made, and IMO it needs to be made by answering these kinds 
> of questions at a similar level of detail to the symmetric dyadic 
> primitives wiki page.
> 
> BTW there is one more question I should add:
> 
> - Who will implement and maintain this service/feature, and the 
> associated changes to existing services?
> 

I was thinking Mike was suggesting he'd be interested in working on it.
Then again, maybe Mike is just probing to see if OpenStack would be open
to a commercial version of this. :)

Talking about it here helps the rest of the OpenStack contributing
companies think about it. Ultimately, until a user wants it, it should
not get done, as we have plenty of users who want plenty of things that
are not yet done. But perhaps talking about the idea will convince users
this would solve their problems.

That said, I do think we should wrap up the discussion if we don't have
any volunteers and/or enthusiastic users soon. :)

> >> - What would a holistic scheduling service look like? A standalone
> >> service? Part of heat-engine?
> >
> > I see it as a preprocessor of sorts for the current infrastructure engine.
> > It would take the logical expression of the cluster and either turn
> > it into actual deployment instructions or respond to the user that it
> > cannot succeed. Ideally it would just extend the same Heat API.
> >
> >> - How will the scheduling service reserve slots for resources in advance
> >> of them being created? How will those reservations be accounted for and
> >> billed?
> >> - In the event that slots are reserved but those reservations are not
> >> taken up, what will happen?
> >
> > I dont' see the word "reserve" in Mike's proposal, and I don't think this
> > is necessary for the more basic models like Collocation and Anti-Collocation.
> 
> Right, but we're not talking about only the basic models. Reservations 
> are very much needed according to my understanding of the proposal, 
> because the whole point is to co-ordinate across multiple services in a 
> way that is impossible to do atomically.
> 

I'm not sure why one would want to even _try_ to do something atomically
in a massively distributed system across disparate systems. Also the
point is not to coordinate these things, but to produce a plan to
coordinate them.  If you look at it optimistically, you deal with the
set-backs as the concrete plan goes into place. AZ full now? Fail the
plan, recalculate, try again. Route shut down? Fail the plan, recalculate.

> > Reservations would of course make the scheduling decisions more likely to
> > succeed, but it isn't necessary if we do things optimistically. If the
> > stack create or update fails, we can retry with better parameters.
> >
> >> - Once scheduled, how will resources be created in their proper slots as
> >> part of a Heat template?
> >
> > In goes a Heat template (sorry for not using HOT.. still learning it. ;)
> >
> > Resources:
> >    ServerTemplate:
> >      Type: Some::Defined::ProviderType
> >    HAThing1:
> >      Type: OS::Heat::HACluster
> >      Properties:
> >        ClusterSize: 3
> >        MaxPerAZ: 1
> >        PlacementStrategy: anti-collocation
> >        Resources: [ ServerTemplate ]
> >
> > And if we have at least 2 AZ's available, it feeds to the heat engine:
> >
> > Resources:
> >    HAThing1-0:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          availability-zone: zone-A
> >    HAThing1-1:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          availability-zone: zone-B
> >    HAThing1-2:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          availability-zone: zone-A
> >
> > If not, holistic scheduler says back "I don't have enough AZ's to
> > satisfy MaxPerAZ".
> >
> > Now, if Nova grows anti-affininty under the covers that it can manage
> > directly, a later version can just spit out:
> >
> > Resources:
> >    HAThing1-0:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          instance-group: 0
> >          affinity-type: anti
> >    HAThing1-1:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          instance-group: 1
> >          affinity-type: anti
> >    HAThing1-2:
> >      Type: Some::Defined::ProviderType
> >        Parameters:
> >          instance-group: 0
> >          affinity-type: anti
> >
> > The point is that the user cares about their servers not being in the
> > same failure domain, not how that happens.
> >
> >> - What about when the user calls the APIs directly? (i.e. does their own
> >> orchestration - either hand-rolled or using their own standalone Heat.)
> >
> > This has come up with autoscaling too. "Undefined" - that's not your stack.
> 
> Well, when we have the new autoscaling service you'll still be able to 
> create an autoscaling group using your own standalone Heat engine. If 
> the provider has a scheduling service, why shouldn't you be able to use 
> that with your own standalone Heat engine too?
> 

I misundertsood your question originally. Agreed that this is an interesting
use case and one that would be good to support.

> >> - How and from where will the scheduling service obtain the utilisation
> >> data needed to perform the scheduling? What mechanism will segregate
> >> this information from the end user?
> >
> > I do think this is a big missing piece. Right now it is spread out
> > all over the place. Keystone at least has regions, so that could be
> > incorporated now. I briefly dug through the other API's and don't see
> > a way to enumerate AZ's or cells. Perhaps it is hiding in extensions?
> >
> > I don't think this must be segregated from end users. An API for "show
> > me the placement decisions I can make" seems useful for anybody trying
> > to automate deployments. Anyway, probably best to keep it decentralized
> > and just make it so that each service can respond with lists of arguments
> > to their API that are likely to succeed.
> 
> I think you're thinking about the very simplest case still (e.g. list of 
> AZs - we have that already). To implement a completely general 
> scheduling service you're going to need data down to the level of e.g. 
> which machines are overcommitted and by how much. Good luck convincing 
> public cloud providers to make this available through a user-facing API. 
> The unintended consequences only _begin_ with pathological user 
> behaviour, and end somewhere in the realm of lawsuits, financial 
> reporting and competitive analysis.
> 

I don't expect the public cloud providers to expose "which machines are
overcommitted".  I did gloss over those two things as one in my response,
and I have to agree that there are things you want to expose to users,
and things you don't. I believe that we have a RBAC system in OpenStack
to satisfy this requirement.

Users are mining this data out of Amazon right now anyway btw. Have seen
the graphite installations with "instances are doing X IOPS today in Y
region". But I digress, mined data is not the same as fetched data.

Anyway, the API for end users is just "show me the networks I can attach
to" and "show me the volume services", most/all of which we probably
have today. The scheduler would in fact need access to the metrics to
tie those to.

> As Mike pointed out downthread, the scheduler primarily serves the cloud 
> provider's interest. That means the raw input data is at best (when 
> compared to the actual scheduler output) a record of exactly how much 
> the provider does or does not care about users, and at worst a basis for 
> users building their own scheduler that serves only their own interest.
> 
> So the scheduler service needs some privileged access to the internals 
> of each service. Heat is unprivileged (it just calls public APIs - you 
> can run your own locally). How to resolve that mismatch is a key 
> question if scheduling is to become part of Heat.
> 

You are painting cloud providers as uncaring slum lords. Of course there
will be slum lords in any ecosystem, but there will also be high quality
service providers and private cloud operators with high expectations that
can use this type of feature as something to benefit only the users at a
high cost.



More information about the OpenStack-dev mailing list