[openstack-dev] Scheduler proposal

Ian Wells ijw.ubuntu at cack.org.uk
Tue Oct 13 02:30:28 UTC 2015

On 10 October 2015 at 23:47, Clint Byrum <clint at fewbar.com> wrote:

> > Per before, my suggestion was that every scheduler tries to maintain a
> copy
> > of the cloud's state in memory (in much the same way, per the previous
> > example, as every router on the internet tries to make a route table out
> of
> > what it learns from BGP).  They don't have to be perfect.  They don't
> have
> > to be in sync.  As long as there's some variability in the decision
> making,
> > they don't have to update when another scheduler schedules something (and
> > you can make the compute node send an immediate update when a new VM is
> > run, anyway).  They all stand a good chance of scheduling VMs well
> > simultaneously.
> >
> I'm quite in favor of eventual consistency and retries. Even if we had
> a system of perfect updating of all state records everywhere, it would
> break sometimes and I'd still want to not trust any record of state as
> being correct for the entire distributed system. However, there is an
> efficiency win gained by staying _close_ to correct. It is actually a
> function of the expected entropy. The more concurrent schedulers, the
> more entropy there will be to deal with.

... and the fewer the servers in total, the larger the entropy as a
proportion of the whole system (if that's a thing, it's a long time since I
did physical chemistry).  But consider the use cases:

1. I have a small cloud, I run two schedulers for redundancy.  There's a
good possibility that, when the cloud is loaded, the schedulers make poor
decisions occasionally.  We'd have to consider how likely that was,

2. I have a large cloud, and I run 20 schedulers for redundancy.  There's a
good chance that a scheduler is out of date on its information.  But there
could be several hundred hosts willing to satisfy a scheduling request, and
even of the ones with incorrect information a low chance that any of those
are close to the threshold where they won't run the VM in question, so good
odds it will pick a host that's happy to satsify the request.

> But to be fair, we're throwing made up numbers around at this point.
> Maybe
> > it's time to work out how to test this for scale in a harness - which is
> > the bit of work we all really need to do this properly, or there's no
> proof
> > we've actually helped - and leave people to code their ideas up?
> I'm working on adding meters for rates and amounts of messages and
> queries that the system does right now for performance purposes. Rally
> though, is the place where I'd go to ask "how fast can we schedule things
> right now?".

My only concern is that we're testing a real cloud at scale and I haven't
got any more firstborn to sell for hardware, so I wonder if we can fake up
a compute node in our test harness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151012/389ee7f8/attachment.html>

More information about the OpenStack-dev mailing list