Open Stack

Thu Oct 8 18:38:33 UTC 2015

On 8 October 2015 at 09:10, Ed Leafe <ed at leafe.com> wrote:

> You've hit upon the problem with the current design: multiple, and
> potentially out-of-sync copies of the data.

Arguably, this is the *intent* of the current design, not a problem with
it.  The data can never be perfect (ever) so go with 'good enough' and run
with it, and deal with the corner cases.  Truth be told, storing that data
in MySQL is secondary to the correct functioning of the scheduler.  The one
thing it helps with is when the scheduler restarts - it stands a chance of
making sensible decisions before it gets its full picture back.  (This is
all very like route distribution protocols, you know: make the best
decision on the information you have to hand, assuming the rest of the
system will deal with your mistakes.  And hold times, and graceful restart,
and...)

> What you're proposing doesn't really sound all that different than the
> current design, which has the compute nodes send the updates in their state
> to the scheduler both on a scheduled task, and in response to changes. The
> impetus for the Cassandra proposal was to eliminate this duplication, and
> have the resources being scheduled and the scheduler all working with the
> same data.

Is there any reason why the duplication (given it's not a huge amount of
data - megabytes, not gigabytes) is a problem?  Is there any reason why
inconsistency is a problem?

What you propose is a change in behaviour.  The scheduler today is intended
to make the best decision based on the available information, without
locks, and on the assumption that other things might be scheduling at the
same time.  Your proposal comes across as making all schedulers work on one
accurate copy of information that they keep updated (not, I think, entirely
synchronously, so they can still be working on outdated information, but
rather closer to it).  But when you have hundreds of hosts willing to take
a machine then there's typically no one answer to a scheduling decision and
we can tolerate really quite a lot of variability.

I do sympathise with your point in the following email where you have 5 VMs
scheduled by 5 schedulers to the same host, but consider:

1. if only one host suits the 5 VMs this results in the same behaviour: 1
VM runs, the rest don't.  There's more work to discover that but arguably
less work than maintaining a consistent database.
2. if many hosts suit the 5 VMs then this is *very* unlucky, because we
should be choosing a host at random from the set of suitable hosts and
that's a huge coincidence - so this is a tiny corner case that we shouldn't
be designing around

The worst case, is, however

3. we attempt to pick the optimal host, and the optimal host for all 5 VMs
is the same despite there being other less perfect choices out there.  That
would get you a stampeding herd and a bunch of retries.

I admit that the current system does not solve well for (3).
-- 
Ian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151008/e9082800/attachment.html>

Open Stack

[openstack-dev] Scheduler proposal

OpenStack

Community

Documentation

Branding & Legal