[openstack-dev] [Nova] [Gantt][Scheduler-split] Why we need a Smart Placement Engine as a Service! (was: Scheduler split status (updated))

Debojyoti Dutta ddutta at gmail.com
Tue Jul 15 15:44:27 UTC 2014


https://etherpad.openstack.org/p/SchedulerUseCases

[08:43:35] <n0ano> #action all update the use case etherpad
athttps://etherpad.openstack.org/p/SchedulerUseCases

Please update your use cases here ......

debo

On Mon, Jul 14, 2014 at 7:25 PM, Yathiraj Udupi (yudupi)
<yudupi at cisco.com> wrote:
> Hi all,
>
> Adding to the interesting discussion thread regarding the scheduler split
> and its importance, I would like to pitch in a couple of thoughts in favor
> of Gantt.  It was in the Icehouse summit in HKG in one of the scheduler
> design sessions, I along with a few others (cc’d) pitched a session on Smart
> Resource Placement
> (https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement),
> where we pitched for a  Smart Placement Decision Engine  as a Service ,
> addressing cross-service scheduling as one of the use cases.  We pitched the
> idea as to how a stand-alone service can act as a  smart resource placement
> engine, (see figure:
> https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1)
> that can use state data from all the services, and make a unified placement
> decision.   We even have proposed a separate blueprint
> (https://blueprints.launchpad.net/nova/+spec/solver-scheduler with working
> code now here: https://github.com/CiscoSystems/nova-solver-scheduler) called
> Smart Scheduler (Solver Scheduler), which has the goals of being able to do
> smart resource placement taking into account complex constraints
> incorporating compute(nova), storage(cinder), and network constraints.   The
> existing Filter Scheduler or the projects like Smart (Solver) Scheduler (for
> covering the complex constraints scenarios) could easily fulfill the
> decision making aspects of the placement engine.
>
> I believe the Gantt project is the right direction in terms of separating
> out the placement decision concern, and creating a separate scheduler as a
> service, so that it can freely talk to any of the other services, or use a
> unified global state repository and make the unified decision.  Projects
> like Smart(Solver) Scheduler can easily fit into the Gantt Project as
> pluggable drivers to add the additional smarts required.
>
> To make our Smart Scheduler as a service, we currently have prototyped this
> Scheduler as a service providing a RESTful interface to the smart scheduler,
> that is detached from Nova (loosely connected):
> For example a RESTful request like this (where I am requests for 2 Vms, with
> a requirement of 1 GB disk, and another request for 1 Vm of flavor
> ‘m1.tiny’, but also has a special requirement that it should be close to the
> volume with uuid: “ef6348300bc511e4bc4cc03fd564d1bc" (Compute-Volume
> affinity constraint)) :
>
>
> curl -i -H "Content-Type: application/json" -X POST -d
> '{"instance_requests": [{"num_instances": 2, "request_properties":
> {"instance_type": {"root_gb": 1}}}, {"num_instances": 1,
> "request_properties": {"flavor": "m1.tiny”, “volume_affinity":
> "ef6348300bc511e4bc4cc03fd564d1bc"}}]}'
> http://<x.x.x.x>/smart-scheduler-as-a-service/v1.0/placement
>
>
> provides a placement decision something like this:
>
> {
>
>   "result": [
>
>     [
>
>       {
>
>         "host": {
>
>           "host": "Host1",
>
>           "nodename": "Node1"
>
>         },
>
>         "instance_uuid": "VM_ID_0_0"
>
>       },
>
>       {
>
>         "host": {
>
>           "host": "Host2",
>
>           "nodename": "Node2"
>
>         },
>
>         "instance_uuid": "VM_ID_0_1"
>
>       }
>
>     ],
>
>     [
>
>       {
>
>         "host": {
>
>           "host": "Host1",
>
>           "nodename": "Node1"
>
>         },
>
>         "instance_uuid": "VM_ID_1_0"
>
>       }
>
>     ]
>
>   ]
>
> }
>
>
> This placement result can be used by Nova to proceed and complete the
> scheduling.
>
>
> This is where I see the potential for Gantt, which will be a stand alone
> placement decision engine, and can easily accommodate different pluggable
> engines (such as Smart Scheduler
> (https://blueprints.launchpad.net/nova/+spec/solver-scheduler))  to do smart
> placement decisions.
>
>
> Pointers:
> Smart Resource Placement overview:
> https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1
> Figure:
> https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1
> Nova Design Session Etherpad:
> https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement
> https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions
> Smart Scheduler Blueprint:
> https://blueprints.launchpad.net/nova/+spec/solver-scheduler
> Working code: https://github.com/CiscoSystems/nova-solver-scheduler
>
>
> Thanks,
>
> Yathi.
>
>
>
>
>
>
> On 7/14/14, 1:40 PM, "Murray, Paul (HP Cloud)" <pmurray at hp.com> wrote:
>
> Hi All,
>
>
>
> I’m sorry I am so late to this lively discussion – it looks a good one! Jay
> has been driving the debate a bit so most of this is in response to his
> comments. But please, anyone should chip in.
>
>
>
> On extensible resource tracking
>
>
>
> Jay, I am surprised to hear you say no one has explained to you why there is
> an extensible resource tracking blueprint. It’s simple, there was a
> succession of blueprints wanting to add data about this and that to the
> resource tracker and the scheduler and the database tables used to
> communicate. These included capabilities, all the stuff in the stats,
> rxtx_factor, the equivalent for cpu (which only works on one hypervisor I
> think), pci_stats and more were coming including,
>
>
>
> https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement
>
> https://blueprints.launchpad.net/nova/+spec/cpu-entitlement
>
>
>
> So, in short, your claim that there are no operators asking for additional
> stuff is simply not true.
>
>
>
> Around about the Icehouse summit (I think) it was suggested that we should
> stop the obvious trend and add a way to make resource tracking extensible,
> similar to metrics, which had just been added as an extensible way of
> collecting on going usage data (because that was also wanted).
>
>
>
> The json blob you refer to was down to the bad experience of the
> compute_node_stats table implemented for stats – which had a particular
> performance hit because it required an expensive join. This was dealt with
> by removing the table and adding a string field to contain the data as a
> json blob. A pure performance optimization. Clearly there is no need to
> store things in this way and with Nova objects being introduced there is a
> means to provide strict type checking on the data even if it is stored as
> json blobs in the database.
>
>
>
> On scheduler split
>
>
>
> I have no particular position on splitting the scheduler. However, there was
> an interesting reaction to the network bandwidth entitlement blueprint
> listed above. The nova community felt it was a network thing and so nova
> should not provide it – neutron should. Of course, in nova, the nova
> scheduler makes placement decisions… can you see where this is going…? Nova
> needs to coordinate its placement decision with neutron to decide if a host
> has sufficient bandwidth available. Similar points are made about cinder –
> nova has no idea about cinder, but in some environments the location of a
> volume matters when you come to place an instance.
>
>
>
> I should re-iterate that I have no position on splitting out the scheduler,
> but some way to deal with information from outside nova is certainly
> desirable. Maybe other services have the same dilemma.
>
>
>
> On global resource tracker
>
>
>
> I have to say I am inclined to be against the idea of turning the scheduler
> into a “global resource tracker”. I do see the benefit of obtaining a
> resource claim up front, we have all seen that the scheduler can make
> incorrect choices because of the delay in reflecting resource allocation to
> the database and so to the scheduler – it operates on imperfect information.
> However, it is best to avoid a global service relying on synchronous
> interaction with compute nodes during the process of servicing a request. I
> have looked at your example code for the scheduler (global resource tracker)
> and it seems to make a choice from local information and then interact with
> the chosen compute node to obtain a claim and then try again if the claim
> fails. I get it – I see that it deals with the same list of hosts on the
> retry. I also see it has no better chance of getting it right.
>
>
>
> Your desire to have a claim is borne out by the persistent claims spec (I
> love the spec, I really I don’t see why they have to be persistent). I think
> that is a great idea. Why not let the scheduler make placement suggestions
> (as a global service) and then allow conductors to obtain the claim and
> retry if the claim fails? Similar process to your code, but the scheduler
> only does its part and the conductors scale out the process by acting more
> locally and with more parallelism. (Of course, you could also be optimistic
> and allow the compute node to do the claim as part of the create as the
> degenerate case).
>
>
>
> To emphasize the point further, what would a cells scheduler do? Would that
> also make a synchronous operation to obtain the claim?
>
>
>
> My reaction to the global resource tracker idea has been quite negative. I
> want to like the idea because I like the thought of knowing I have the
> resources when I get my answer. Its just that I think the persistent claims
> (without the persistent part J ) gives us a lot of what we need. But I am
> still open to be convinced.
>
>
>
> Paul
>
>
>
>
>
>
>
> On 07/14/2014 10:16 AM, Sylvain Bauza wrote:
>
>> Le 12/07/2014 06:07, Jay Pipes a écrit :
>
>>> On 07/11/2014 07:14 AM, John Garbutt wrote:
>
>>>> On 10 July 2014 16:59, Sylvain Bauza <sbauza at redhat.com> wrote:
>
>>>>> Le 10/07/2014 15:47, Russell Bryant a écrit :
>
>>>>>> On 07/10/2014 05:06 AM, Sylvain Bauza wrote:
>
>>>>>>> Hi all,
>
>>>>>>>
>
>>>>>>> === tl;dr: Now that we agree on waiting for the split
>
>>>>>>> prereqs to be done, we debate on if ResourceTracker should
>
>>>>>>> be part of the scheduler code and consequently Scheduler
>
>>>>>>> should expose ResourceTracker APIs so that Nova wouldn't
>
>>>>>>> own compute nodes resources. I'm proposing to first come
>
>>>>>>> with RT as Nova resource in Juno and move ResourceTracker
>
>>>>>>> in Scheduler for K, so we at least merge some patches by
>
>>>>>>> Juno. ===
>
>>>>>>>
>
>>>>>>> Some debates occured recently about the scheduler split, so
>
>>>>>>> I think it's important to loop back with you all to see
>
>>>>>>> where we are and what are the discussions. Again, feel free
>
>>>>>>> to express your opinions, they are welcome.
>
>>>>>> Where did this resource tracker discussion come up?  Do you
>
>>>>>> have any references that I can read to catch up on it?  I
>
>>>>>> would like to see more detail on the proposal for what should
>
>>>>>> stay in Nova vs. be moved.  What is the interface between
>
>>>>>> Nova and the scheduler here?
>
>>>>>
>
>>>>> Oh, missed the most important question you asked. So, about
>
>>>>> the interface in between scheduler and Nova, the original
>
>>>>> agreed proposal is in the spec
>
>>>>> https://review.openstack.org/82133 (approved) where the
>
>>>>> Scheduler exposes : - select_destinations() : for querying the
>
>>>>> scheduler to provide candidates - update_resource_stats() : for
>
>>>>> updating the scheduler internal state (ie. HostState)
>
>>>>>
>
>>>>> Here, update_resource_stats() is called by the
>
>>>>> ResourceTracker, see the implementations (in review)
>
>>>>> https://review.openstack.org/82778 and
>
>>>>> https://review.openstack.org/104556.
>
>>>>>
>
>>>>> The alternative that has just been raised this week is to
>
>>>>> provide a new interface where ComputeNode claims for resources
>
>>>>> and frees these resources, so that all the resources are fully
>
>>>>> owned by the Scheduler. An initial PoC has been raised here
>
>>>>> https://review.openstack.org/103598 but I tried to see what
>
>>>>> would be a ResourceTracker proxified by a Scheduler client here
>
>>>>> : https://review.openstack.org/105747. As the spec hasn't been
>
>>>>> written, the names of the interfaces are not properly defined
>
>>>>> but I made a proposal as : - select_destinations() : same as
>
>>>>> above - usage_claim() : claim a resource amount -
>
>>>>> usage_update() : update a resource amount - usage_drop(): frees
>
>>>>> the resource amount
>
>>>>>
>
>>>>> Again, this is a dummy proposal, a spec has to written if we
>
>>>>> consider moving the RT.
>
>>>>
>
>>>> While I am not against moving the resource tracker, I feel we
>
>>>> could move this to Gantt after the core scheduling has been
>
>>>> moved.
>
>>>
>
>>> Big -1 from me on this, John.
>
>>>
>
>>> Frankly, I see no urgency whatsoever -- and actually very little
>
>>> benefit -- to moving the scheduler out of Nova. The Gantt project I
>
>>> think is getting ahead of itself by focusing on a split instead of
>
>>> focusing on cleaning up the interfaces between nova-conductor,
>
>>> nova-scheduler, and nova-compute.
>
>>>
>
>>
>
>> -1 on saying there is no urgency. Don't you see the NFV group saying
>
>> each meeting what is the status of the scheduler split ?
>
>
>
> Frankly, I don't think a lot of the NFV use cases are well-defined.
>
>
>
> Even more frankly, I don't see any benefit to a split-out scheduler to a
>
> single NFV use case.
>
>
>
>> Don't you see each Summit the lots of talks (and people attending
>
>> them) talking about how OpenStack should look at Pets vs. Cattle and
>
>> saying that the scheduler should be out of Nova ?
>
>
>
> There's been no concrete benefits discussed to having the scheduler
>
> outside of Nova.
>
>
>
> I don't really care how many people say that the scheduler should be out
>
> of Nova unless those same people come to the table with concrete reasons
>
> why. Just saying something is a benefit does not make it a benefit, and
>
> I think I've outlined some of the very real dangers -- in terms of code
>
> and payload complexity -- of breaking the scheduler out of Nova until
>
> the interfaces are cleaned up and the scheduler actually owns the
>
> resources upon which it exercises placement decisions.
>
>
>
>> From an operator perspective, people waited so long for having a
>
>> scheduler doing "scheduling" and not only "resource placement".
>
>
>
> Could you elaborate a bit here? What operators are begging for the
>
> scheduler to do more than resource placement? And if they are begging
>
> for this, what use cases are they trying to address?
>
>
>
> I'm genuinely curious, so looking forward to your reply here! :)
>
>
>
> snip...
>
>
>
>>> As for the idea that things will get *easier* once scheduler code
>
>>> is broken out of Nova, I go back to my original statement that I
>
>>> don't really see the benefit of the split at this point, and I
>
>>> would just bring up the fact that Neutron/nova-network is a shining
>
>>> example of how things can easily backfire when splitting of code is
>
>>> done too early before interfaces are cleaned up and
>
>>> responsibilities between internal components are not clearly agreed
>
>>> upon.
>
>>
>
>> Please, please, don't mix the rationale for extensible Resource
>
>> Tracker and the current efforts for moving out the Scheduler. Both of
>
>> them try to have an agnostic and heterogeneous scheduler, but both
>
>> efforts are independent.
>
>>
>
>> The ResourceTracker is something pure Nova. Saying to Gantt "I want
>
>> to store this data" and "I want you to select a destination" is
>
>> something enough agnostic for not including the port of
>
>> ResourceTracker to the Scheduler.
>
>
>
> Sorry, I'm not following you. Who is saying to Gantt "I want to store
>
> this data"?
>
>
>
> All I am saying is that the thing that places a resource on some
>
> provider of that resource should be the thing that owns the process of a
>
> requester *claiming* the resources on that provider, and in order to
>
> properly track resources in a race-free way in such a system, then the
>
> system needs to contain the resource tracker.
>
>
>
>> While I approve to define the interfaces now, there is no reason tho
>
>> to say we would have to change anything in how Nova is doing that.
>
>> The role of Gantt is to define the interfaces, make the line
>
>> Scheduler vs. Nova and forklift the Scheduler into a single project.
>
>> No big bang is needed here.
>
>
>
> Yeah, I just don't see the need to split the scheduler at this point,
>
> sorry. :(
>
>
>
> Best,
>
> -jay



-- 
-Debo~



More information about the OpenStack-dev mailing list