[openstack-dev] [Nova] [Gantt] Scheduler split status (updated)

Sylvain Bauza sbauza at redhat.com
Thu Jul 17 11:04:11 UTC 2014


Le 17/07/2014 01:24, Robert Collins a écrit :
> On 15 July 2014 06:10, Jay Pipes <jaypipes at gmail.com> wrote:
>
>> Frankly, I don't think a lot of the NFV use cases are well-defined.
>>
>> Even more frankly, I don't see any benefit to a split-out scheduler to a
>> single NFV use case.
>>
>>
>>> Don't you see each Summit the lots of talks (and people attending
>>> them) talking about how OpenStack should look at Pets vs. Cattle and
>>> saying that the scheduler should be out of Nova ?
>>
>> There's been no concrete benefits discussed to having the scheduler outside
>> of Nova.
>>
>> I don't really care how many people say that the scheduler should be out of
>> Nova unless those same people come to the table with concrete reasons why.
>> Just saying something is a benefit does not make it a benefit, and I think
>> I've outlined some of the very real dangers -- in terms of code and payload
>> complexity -- of breaking the scheduler out of Nova until the interfaces are
>> cleaned up and the scheduler actually owns the resources upon which it
>> exercises placement decisions.
> I agree with the risks if we get it wrong.
>
> In terms of benefits, I want to do cross-domain scheduling: 'Give me
> five Galera servers with no shared non-HA infrastructure and
> resiliency to no less than 2 separate failures'. By far the largest
> push back I get is 'how do I make Ironic pick the servers I want it
> to' when talking to ops folk about using Ironic. And when you dig into
> that, it falls into two buckets:
>  - role based mappings (e.g. storage optimised vs cpu optimised) -
> which Ironic can trivially do
>  - failure domain and performance domain optimisation
>    - which Nova cannot do at all today.
>
> I want this very very very badly, and I would love to be pushing
> directly on it, but its just under a few other key features like
> 'working HA' and 'efficient updates' that sadly matter more in the
> short term.
>

I share your views on what should be the scheduler, once pushed out of Nova.
As I said, there are various concerns and asked features for the
scheduler that are missing here and now, and which are too big to be fit
in Nova.

In order to make sure we're going into the right direction, we decided
during last Gantt meeting to provide usecases where an external
Scheduler could be interesting. Don't hesitate to add your baremetal
usecases (for deployment with TripleO or others) in there :
https://etherpad.openstack.org/p/SchedulerUseCases

Take it as a first attempt to identify what would be the mission
statement for Gantt, if you wish.


>
>> Sorry, I'm not following you. Who is saying to Gantt "I want to store this
>> data"?
>>
>> All I am saying is that the thing that places a resource on some provider of
>> that resource should be the thing that owns the process of a requester
>> *claiming* the resources on that provider, and in order to properly track
>> resources in a race-free way in such a system, then the system needs to
>> contain the resource tracker.
> Trying to translate that:
>  - the scheduler (thing that places a resource)
>  - should own the act of claiming a resource
>  - to avoid races the scheduler should own the tracker
>
> So I think we need to aknowledge that Right Now we have massive races.
> We can choose where we put our efforts - we can try to fix them in the
> current architecture, we can try to fix them by changing the
> architecture.
>
> I think you agree that the current architecture is wrong; and that
> from a risk perspective the gantt extraction should not change the
> architecture - as part of making it graceful and cinder-like with
> immediate use by Nova.
>
> But once extracted the architecture can change - versioned APIs FTW.
>
> To my mind the key question is not whether the thing will be *better*
> with gantt extracted, it is whether it will be *no worse*, while
> simultaneously enabling a bunch of pent up demand in another part of
> the community.
>
> That seems hard to answer categorically, but it seems to me the key
> risk is whether changing the architecture will be too hard / unsafe
> post extraction.
>
> However in Nova it takes months and months to land things (and I'm not
> poking here - TripleO has the same issue at the moment) - I think
> there is a very real possibility that gantt can improve much faster
> and efficiently as a new project, once forklifted out. Patches to Nova
> to move to newer APIs can be posted and worked with while folk work on
> other bits of key plumbing like performance (e.g. not loading every
> host in the entire cloud into ram on every scheduling request),
> scalability (e.g. elegantly solving the current racy behaviour between
> different scheduler instances) and begin the work to expose the
> scheduler to neutron and cinder.


Unless I misunderstood (and that happens, I'm badly human - and French
-), I'm giving a +2 to your statement : yes, there are race conditions
in the scheduler (and I saw the bug you filed and I'm hesitating to
handle it now), yes the scheduler is not that perfect now, yes we should
ensure that claiming a resource should be 'ACID' (well, sort of...
emphasizing...) but *also* yes it could take a while before that would
be fixed.

Take it as an Agile or Lean method if you wish, I'm just saying that we
would be far most effective if we would do the split (once the feature
parity with Nova scheduler is ensured) and then rework the reworkable
for improving the Scheduler, even if that touches the Gantt API (albeit
not sure the interfaces would change, see my above emails).


-Sylvain


> -Rob
>




More information about the OpenStack-dev mailing list