Open Stack

Fri Oct 11 12:19:40 UTC 2013

Long-story short, sounds like we do have the same concerns here in Climate.

I'll be present at the Summit, any chance to do an unconference meeting 
in between all parties ?

Thanks,
-Sylvain

Le 11/10/2013 08:25, Mike Spreitzer a écrit :
> Regarding Alex's question of which component does holistic 
> infrastructure scheduling, I hesitate to simply answer "heat".  Heat 
> is about orchestration, and infrastructure scheduling is another 
> matter.  I have attempted to draw pictures to sort this out, see 
> https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9Uand 
> https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g. 
>  In those you will see that I identify holistic infrastructure 
> scheduling as separate functionality from infrastructure orchestration 
> (the main job of today's heat engine) and also separate from software 
> orchestration concerns.  However, I also see a close relationship 
> between holistic infrastructure scheduling and heat, as should be 
> evident in those pictures too.
>
> Alex made a remark about the needed inputs, and I agree but would like 
> to expand a little on the topic.  One thing any scheduler needs is 
> knowledge of the amount, structure, and capacity of the hosting 
> thingies (I wish I could say "resources", but that would be confusing) 
> onto which the workload is to be scheduled.  Scheduling decisions are 
> made against available capacity.  I think the most practical way to 
> determine available capacity is to separately track raw capacity and 
> current (plus already planned!) allocations from that capacity, 
> finally subtracting the latter from the former.
>
> In Nova, for example, sensing raw capacity is handled by the various 
> nova-compute agents reporting that information.  I think a holistic 
> infrastructure scheduler should get that information from the various 
> individual services (Nova, Cinder, etc) that it is concerned with 
> (presumably they have it anyway).
>
> A holistic infrastructure scheduler can keep track of the allocations 
> it has planned (regardless of whether they have been executed yet). 
>  However, there may also be allocations that did not originate in the 
> holistic infrastructure scheduler.  The individual underlying services 
> should be able to report (to the holistic infrastructure scheduler, 
> even if lowly users are not so authorized) all the allocations 
> currently in effect.  An accurate union of the current and planned 
> allocations is what we want to subtract from raw capacity to get 
> available capacity.
>
> If there is a long delay between planning and executing an allocation, 
> there can be nasty surprises from competitors --- if there are any 
> competitors.  Actually, there can be nasty surprises anyway.  Any 
> scheduler should be prepared for nasty surprises, and react by some 
> sensible retrying.  If nasty surprises are rare, we are pretty much 
> done.  If nasty surprises due to the presence of competing managers 
> are common, we may be able to combat the problem by changing the long 
> delay to a short one --- by moving the allocation execution earlier 
> into a stage that is only about locking in allocations, leaving all 
> the other work involved in creating virtual resources to later 
> (perhaps Climate will be good for this).  If the delay between 
> planning and executing an allocation is short and there are many nasty 
> surprises due to competing managers, then you have too much 
> competition between managers --- don't do that.
>
> Debo wants a simpler nova-centric story.  OK, how about the following. 
>  This is for the first step in the roadmap, where scheduling decisions 
> are still made independently for each VM instance.  For the 
> client/service interface, I think we can do this with a simple clean 
> two-phase interface when traditional software orchestration is in 
> play, a one-phase interface when slick new software orchestration is 
> used.  Let me outline the two-phase flow.  We extend the Nova API with 
> CRUD operations on VRTs (top-level groups).  For example, the CREATE 
> operation takes a definition of a top-level group and all its nested 
> groups, definitions (excepting stuff like userdata) of all the 
> resources (only VM instances, for now) contained in those groups, all 
> the relationships among those groups/resources, and all the 
> applications of policy to those groups, resources, and relationships. 
>  This is a rest-style interface; the CREATE operation takes a 
> definition of the thing (a top-level group and all that it contains) 
> being created; the UPDATE operation takes a revised definition of the 
> whole thing.  Nova records the presented information; the familiar 
> stuff is stored essentially as it is today (but marked as being in 
> some new sort of tentative state), and the grouping, relationship, and 
> policy stuff is stored according to a model like the one Debo&Yathi 
> wrote.  The CREATE operation returns a UUID for the newly created 
> top-level group.  The invocation of the top-level group CRUD is a 
> single operation and it is the first of the two phases.  In the second 
> phase of a CREATE flow, the client creates individual resources with 
> the same calls as are used today, except that each VM instance create 
> call is augmented with a pointer into the policy information.  That 
> pointer consists of (1) the UUID of the relevant top-level group and 
> (2) the name used within that group to identify the resource now being 
> created.  (Obviously we would need resources to be named uniquely 
> among all the things ultimately contained anywhere in the same 
> top-level group.  That could be done, e.g., with path names and a 
> requirement only that siblings have distinct names.  Or we could 
> simply require that names be unique without mandating any particular 
> structure.  We could call them IDs rather than names.)  The way Nova 
> handles a VM-create call can now be enhanced to reference and use the 
> policy information that is associated with the newly passed policy 
> pointer.
>
> The UPDATE flow is similar: first UPDATE the top-level group, then 
> update individual resources.
>
> For the definition of a top-level group and all that it contains we 
> need some language.  I think the obvious answer is an extended version 
> of the HOT language.  Which is why I have proposed such an extension. 
>  It is not because I am confused about what the heat engine should do, 
> it is because I want something else (the policy-informed scheduler) to 
> have an input language with sufficient content.  This is the role 
> played by "HOT+" in the first of my two pictures cited above.  The 
> same sort of language is needed in the first step of the roadmap, 
> where it is only Nova that is policy-informed and scheduling is not 
> yet joint --- but at this early step of the roadmap the 
> resources+policy language is input to Nova rather than to a separate 
> holistic infrastructure scheduler.
>
> Regards,
> Mike
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131011/0bc9fbde/attachment.html>

Open Stack

[openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

OpenStack

Community

Documentation

Branding & Legal