[openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling)
Mike Spreitzer
mspreitz at us.ibm.com
Tue Oct 1 02:21:22 UTC 2013
OK, let's take the holistic infrastructure scheduling out of Heat. It
really belongs at a lower level anyway. Think of it as something you slap
on top of Nova, Cinder, Neutron, etc. and everything that is going to use
them goes first through the holistic scheduler, to give it a chance to
make some joint decisions. Zane has been worried about conflicting
decisions being made, but if everything goes through the holistic
infrastructure scheduling service then there does not need to be an issue
with other parallel decision-making services (more on this below). For a
public cloud, think of this holistic infrastructure scheduling as part of
the service that the cloud offers to the public; the public says what it
wants, and the various levels of schedulers work on delivering it; the
internals are not exposed to the public. For example, a cloud user may
say "spread my cluster across at least two racks, not too unevenly"; you
do not want that public cloud customer to be in the business of knowing
how many racks are in the cloud, knowing how much each one is currently
being used, and picking which rack will contain which members of his
cluster. For a private cloud, the holistic infrastructure scheduler
should have the same humility as the lower schedulers: offer enough
visibility and control to the clients that they can make decisions if they
want to (thus, nobody needs to "go around" the holistic infrastructure
scheduler if they already know what they want).
You do not want to ask the holistic infrastructure scheduler to schedule
resources one by one; you want to ask it to allocate a whole
pattern/template/topology. There is thus no need for infrastructure
orchestration prior to holistic infrastructure scheduling.
Once the holistic infrastructure scheduler has done its job, there is a
need for infrastructure orchestration. What should we use for that?
OK, more on the business of conflicting decisions. For the sake of
scalability and modularity, the holistic infrastructure scheduler should
delegate as much decision-making as it can to more specific services. The
job of the holistic infrastructure scheduler is to make joint decisions
when there are strong interactions between services. You can fudge this
either way (have the holistic infrastructure scheduler make more or less
decisions than ideal), but if you want the best then I think the principle
I stated is what would guide. So what if a delegated decision conflicts
with a holistic decision? Don't do that. Divide the decision-making
responsibilities into distinct domains, for example with the holistic
scheduler making relatively big-picture decisions and individual resource
services filling in the details.
That said, there can still be nasty surprises from lower layers. Even if
the design has carefully partitioned decision-making responsibilities,
irregular things can still happen (e.g., authorized people can do
something unexpected). Even if nothing intentionally does anything
irregular, there remains the possibility of bugs. The holistic
infrastructure scheduler should be prepared for nasty surprises, and
getting information that is as authoritative as possible to begin with
(promptness doesn't hurt either).
Then there is the question of the scalability of the holistic
infrastructure scheduler. One hard kernel of that is solving the
optimization problem. Nobody should expect the scheduler to find the
truly optimal solution; this is an NP-hard problem. However, there exist
optimization algorithms that produce pretty good approximations in modest
amounts of time. Additionally: if the patterns are small relative to the
size of the whole zone being scheduled then it should be possible to do
concurrent decision-making with optimistic concurrency control (as Clint
has mentioned).
You would not want one holistic infrastructure scheduler for a whole
geographically distributed cloud. You could use a hierarchical
arrangement, with one top-level decision-maker dividing a pattern between
availability zones (by which I mean the sort of large independent domains
that are typically known by that term) and then a subsidiary scheduler for
each availability zone.
Regards,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130930/7bcc25e2/attachment.html>
More information about the OpenStack-dev
mailing list