[openstack-dev] A simple way to improve nova scheduler

Russell Bryant rbryant at redhat.com
Wed Jul 24 16:24:06 UTC 2013

On 07/23/2013 06:00 PM, Clint Byrum wrote:
> This is really interesting work, thanks for sharing it with us. The
> discussion that has followed has brought up some thoughts I've had for
> a while about this choke point in what is supposed to be an extremely
> scalable cloud platform (OpenStack).
> I feel like the discussions have all been centered around making "the"
> scheduler(s) intelligent.  There seems to be a commonly held belief that
> scheduling is a single step, and should be done with as much knowledge
> of the system as possible by a well informed entity.
> Can you name for me one large scale system that has a single entity,
> human or computer, that knows everything about the system and can make
> good decisions quickly?
> This problem is screaming to be broken up, de-coupled, and distributed.
> I keep asking myself these questions:
> Why are all of the compute nodes informing all of the schedulers?
> Why are all of the schedulers expecting to know about all of the compute nodes?
> Can we break this problem up into simpler problems and distribute the load to
> the entire system?
> This has been bouncing around in my head for a while now, but as a
> shallow observer of nova dev, I feel like there are some well known
> scaling techniques which have not been brought up. Here is my idea,
> forgive me if I have glossed over something or missed a huge hole:
> * Schedulers break up compute nodes by hash table, only caring about
>   those in their hash table.
> * Schedulers, upon claiming a compute node by hash table, poll compute
>   node directly for its information.
> * Requests to boot go into fanout.
> * Schedulers get request and try to satisfy using only their own compute
>   nodes.
> * Failure to boot results in re-insertion in the fanout.
> This gives up the certainty that the scheduler will find a compute node
> for a boot request on the first try. It is also possible that a request
> gets unlucky and takes a long time to find the one scheduler that has
> the one last "X" resource that it is looking for. There are some further
> optimization strategies that can be employed (like queues based on hashes
> already tried.. etc).
> Anyway, I don't see any point in trying to hot-rod the intelligent
> scheduler to go super fast, when we can just optimize for having many
> many schedulers doing the same body of work without blocking and without
> pounding a database.

These are some *very* good observations.  I'd like all of the nova folks
interested in this are to give some deep consideration of this type of

Russell Bryant

More information about the OpenStack-dev mailing list