[openstack-dev] A simple way to improve nova scheduler

Clint Byrum clint at fewbar.com
Fri Jul 26 16:22:57 UTC 2013


Excerpts from Joe Gordon's message of 2013-07-24 11:43:46 -0700:
> On Wed, Jul 24, 2013 at 12:24 PM, Russell Bryant <rbryant at redhat.com> wrote:
> 
> > On 07/23/2013 06:00 PM, Clint Byrum wrote:
> > > This is really interesting work, thanks for sharing it with us. The
> > > discussion that has followed has brought up some thoughts I've had for
> > > a while about this choke point in what is supposed to be an extremely
> > > scalable cloud platform (OpenStack).
> > >
> > > I feel like the discussions have all been centered around making "the"
> > > scheduler(s) intelligent.  There seems to be a commonly held belief that
> > > scheduling is a single step, and should be done with as much knowledge
> > > of the system as possible by a well informed entity.
> > >
> > > Can you name for me one large scale system that has a single entity,
> > > human or computer, that knows everything about the system and can make
> > > good decisions quickly?
> > >
> > > This problem is screaming to be broken up, de-coupled, and distributed.
> > >
> > > I keep asking myself these questions:
> > >
> > > Why are all of the compute nodes informing all of the schedulers?
> >
>  >
> > > Why are all of the schedulers expecting to know about all of the compute
> > nodes?
> >
> 
> So the scheduler can try to find the globally optimum solution, see below.
> 

Right, that seems like a costly requirement that most won't need.

> > >
> > > Can we break this problem up into simpler problems and distribute the
> > load to
> > > the entire system?
> > >
> > > This has been bouncing around in my head for a while now, but as a
> > > shallow observer of nova dev, I feel like there are some well known
> > > scaling techniques which have not been brought up. Here is my idea,
> > > forgive me if I have glossed over something or missed a huge hole:
> > >
> > > * Schedulers break up compute nodes by hash table, only caring about
> > >   those in their hash table.
> > > * Schedulers, upon claiming a compute node by hash table, poll compute
> > >   node directly for its information.
> >
> 
> For people who want to schedule on information that is constantly changing
> (such as CPU load, memory usage etc).  How often would you poll?
> 

Thats a great question. The initial poll is mostly "how are you
now?". After that I'm not sure polling would be the best strategy,
so perhaps a broadcast topic per-scheduler would still make sense.
And perhaps that broadcast topic would be enough to not even need the
initial handshake.

> > > * Requests to boot go into fanout.
> > > * Schedulers get request and try to satisfy using only their own compute
> > >   nodes.
> > > * Failure to boot results in re-insertion in the fanout.
> >
> 
> With this model we loose the ability to find the global optimum host to
> schedule on, and can only find an optimal solution.  Which sounds like a
> reasonable scale trade off.  Going forward I can image nova having several
> different schedulers for different requirements.  As someone who is
> deploying at a massive scale will probably accept an optimal solution (and
> a scheduler that scales better) but someone with a smaller cloud will want
> the globally optimum solution.
> 

What you may have missed on first pass, is that if you just have 1
scheduler, you do have globally optimum scheduling. So it is not lost,
it is just factored out as you add schedulers. It is also quite simple
to know when to add schedulers.. when your boot request latency gets
too high.



More information about the OpenStack-dev mailing list