[openstack-dev] Scheduler proposal

Ed Leafe ed at leafe.com
Thu Oct 8 20:28:50 UTC 2015

On Oct 8, 2015, at 1:38 PM, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:

>> You've hit upon the problem with the current design: multiple, and potentially out-of-sync copies of the data.
> Arguably, this is the *intent* of the current design, not a problem with it.

It may have been the intent, but that doesn't mean that we are where we need to be.

> The data can never be perfect (ever) so go with 'good enough' and run with it, and deal with the corner cases.

It is in defining what is "good enough" that is problematic.

> Truth be told, storing that data in MySQL is secondary to the correct functioning of the scheduler.

I have no problem with MySQL (well, I do, but that's not relevant to this discussion). My issue is that the current system poorly replicates its data from MySQL to the places where it is needed.

> The one thing it helps with is when the scheduler restarts - it stands a chance of making sensible decisions before it gets its full picture back.  (This is all very like route distribution protocols, you know: make the best decision on the information you have to hand, assuming the rest of the system will deal with your mistakes.  And hold times, and graceful restart, and…)

Yes, this is all well and good. My focus is on improving the information in hand when making that best decision.

> Is there any reason why the duplication (given it's not a huge amount of data - megabytes, not gigabytes) is a problem?  Is there any reason why inconsistency is a problem?

I'm sure that many of the larger deployments may have issues with the amount of data that must be managed in-memory by so many different parts of the system. Inconsistency is a problem, but one that has workarounds. The primary issue is scalability: with the current design, increasing the number of scheduler processes increases the raciness of the system.

> I do sympathise with your point in the following email where you have 5 VMs scheduled by 5 schedulers to the same host, but consider:
> 1. if only one host suits the 5 VMs this results in the same behaviour: 1 VM runs, the rest don't.  There's more work to discover that but arguably less work than maintaining a consistent database.

True, but in a large scale deployment this is an extremely rare case.

> 2. if many hosts suit the 5 VMs then this is *very* unlucky, because we should be choosing a host at random from the set of suitable hosts and that's a huge coincidence - so this is a tiny corner case that we shouldn't be designing around

Here is where we differ in our understanding. With the current system of filters and weighers, 5 schedulers getting requests for identical VMs and having identical information are *expected* to select the same host. It is not a tiny corner case; it is the most likely result for the current system design. By catching this situation early (in the scheduling process) we can avoid multiple RPC round-trips to handle the fail/retry mechanism.

> The worst case, is, however
> 3. we attempt to pick the optimal host, and the optimal host for all 5 VMs is the same despite there being other less perfect choices out there.  That would get you a stampeding herd and a bunch of retries.
> I admit that the current system does not solve well for (3).

IMO, this is identical to (2).

-- Ed Leafe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151008/d25b1063/attachment.pgp>

More information about the OpenStack-dev mailing list