[openstack-dev] [Nova] [Gantt] Scheduler split status (updated)

Robert Collins robertc at robertcollins.net
Wed Jul 16 23:24:03 UTC 2014


On 15 July 2014 06:10, Jay Pipes <jaypipes at gmail.com> wrote:

> Frankly, I don't think a lot of the NFV use cases are well-defined.
>
> Even more frankly, I don't see any benefit to a split-out scheduler to a
> single NFV use case.
>
>
>> Don't you see each Summit the lots of talks (and people attending
>> them) talking about how OpenStack should look at Pets vs. Cattle and
>> saying that the scheduler should be out of Nova ?
>
>
> There's been no concrete benefits discussed to having the scheduler outside
> of Nova.
>
> I don't really care how many people say that the scheduler should be out of
> Nova unless those same people come to the table with concrete reasons why.
> Just saying something is a benefit does not make it a benefit, and I think
> I've outlined some of the very real dangers -- in terms of code and payload
> complexity -- of breaking the scheduler out of Nova until the interfaces are
> cleaned up and the scheduler actually owns the resources upon which it
> exercises placement decisions.

I agree with the risks if we get it wrong.

In terms of benefits, I want to do cross-domain scheduling: 'Give me
five Galera servers with no shared non-HA infrastructure and
resiliency to no less than 2 separate failures'. By far the largest
push back I get is 'how do I make Ironic pick the servers I want it
to' when talking to ops folk about using Ironic. And when you dig into
that, it falls into two buckets:
 - role based mappings (e.g. storage optimised vs cpu optimised) -
which Ironic can trivially do
 - failure domain and performance domain optimisation
   - which Nova cannot do at all today.

I want this very very very badly, and I would love to be pushing
directly on it, but its just under a few other key features like
'working HA' and 'efficient updates' that sadly matter more in the
short term.



> Sorry, I'm not following you. Who is saying to Gantt "I want to store this
> data"?
>
> All I am saying is that the thing that places a resource on some provider of
> that resource should be the thing that owns the process of a requester
> *claiming* the resources on that provider, and in order to properly track
> resources in a race-free way in such a system, then the system needs to
> contain the resource tracker.

Trying to translate that:
 - the scheduler (thing that places a resource)
 - should own the act of claiming a resource
 - to avoid races the scheduler should own the tracker

So I think we need to aknowledge that Right Now we have massive races.
We can choose where we put our efforts - we can try to fix them in the
current architecture, we can try to fix them by changing the
architecture.

I think you agree that the current architecture is wrong; and that
from a risk perspective the gantt extraction should not change the
architecture - as part of making it graceful and cinder-like with
immediate use by Nova.

But once extracted the architecture can change - versioned APIs FTW.

To my mind the key question is not whether the thing will be *better*
with gantt extracted, it is whether it will be *no worse*, while
simultaneously enabling a bunch of pent up demand in another part of
the community.

That seems hard to answer categorically, but it seems to me the key
risk is whether changing the architecture will be too hard / unsafe
post extraction.

However in Nova it takes months and months to land things (and I'm not
poking here - TripleO has the same issue at the moment) - I think
there is a very real possibility that gantt can improve much faster
and efficiently as a new project, once forklifted out. Patches to Nova
to move to newer APIs can be posted and worked with while folk work on
other bits of key plumbing like performance (e.g. not loading every
host in the entire cloud into ram on every scheduling request),
scalability (e.g. elegantly solving the current racy behaviour between
different scheduler instances) and begin the work to expose the
scheduler to neutron and cinder.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list