[openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Sylvain Bauza
sylvain.bauza at bull.net
Fri Oct 11 12:19:40 UTC 2013
Long-story short, sounds like we do have the same concerns here in Climate.
I'll be present at the Summit, any chance to do an unconference meeting
in between all parties ?
Thanks,
-Sylvain
Le 11/10/2013 08:25, Mike Spreitzer a écrit :
> Regarding Alex's question of which component does holistic
> infrastructure scheduling, I hesitate to simply answer "heat". Heat
> is about orchestration, and infrastructure scheduling is another
> matter. I have attempted to draw pictures to sort this out, see
> https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9Uand
> https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g.
> In those you will see that I identify holistic infrastructure
> scheduling as separate functionality from infrastructure orchestration
> (the main job of today's heat engine) and also separate from software
> orchestration concerns. However, I also see a close relationship
> between holistic infrastructure scheduling and heat, as should be
> evident in those pictures too.
>
> Alex made a remark about the needed inputs, and I agree but would like
> to expand a little on the topic. One thing any scheduler needs is
> knowledge of the amount, structure, and capacity of the hosting
> thingies (I wish I could say "resources", but that would be confusing)
> onto which the workload is to be scheduled. Scheduling decisions are
> made against available capacity. I think the most practical way to
> determine available capacity is to separately track raw capacity and
> current (plus already planned!) allocations from that capacity,
> finally subtracting the latter from the former.
>
> In Nova, for example, sensing raw capacity is handled by the various
> nova-compute agents reporting that information. I think a holistic
> infrastructure scheduler should get that information from the various
> individual services (Nova, Cinder, etc) that it is concerned with
> (presumably they have it anyway).
>
> A holistic infrastructure scheduler can keep track of the allocations
> it has planned (regardless of whether they have been executed yet).
> However, there may also be allocations that did not originate in the
> holistic infrastructure scheduler. The individual underlying services
> should be able to report (to the holistic infrastructure scheduler,
> even if lowly users are not so authorized) all the allocations
> currently in effect. An accurate union of the current and planned
> allocations is what we want to subtract from raw capacity to get
> available capacity.
>
> If there is a long delay between planning and executing an allocation,
> there can be nasty surprises from competitors --- if there are any
> competitors. Actually, there can be nasty surprises anyway. Any
> scheduler should be prepared for nasty surprises, and react by some
> sensible retrying. If nasty surprises are rare, we are pretty much
> done. If nasty surprises due to the presence of competing managers
> are common, we may be able to combat the problem by changing the long
> delay to a short one --- by moving the allocation execution earlier
> into a stage that is only about locking in allocations, leaving all
> the other work involved in creating virtual resources to later
> (perhaps Climate will be good for this). If the delay between
> planning and executing an allocation is short and there are many nasty
> surprises due to competing managers, then you have too much
> competition between managers --- don't do that.
>
> Debo wants a simpler nova-centric story. OK, how about the following.
> This is for the first step in the roadmap, where scheduling decisions
> are still made independently for each VM instance. For the
> client/service interface, I think we can do this with a simple clean
> two-phase interface when traditional software orchestration is in
> play, a one-phase interface when slick new software orchestration is
> used. Let me outline the two-phase flow. We extend the Nova API with
> CRUD operations on VRTs (top-level groups). For example, the CREATE
> operation takes a definition of a top-level group and all its nested
> groups, definitions (excepting stuff like userdata) of all the
> resources (only VM instances, for now) contained in those groups, all
> the relationships among those groups/resources, and all the
> applications of policy to those groups, resources, and relationships.
> This is a rest-style interface; the CREATE operation takes a
> definition of the thing (a top-level group and all that it contains)
> being created; the UPDATE operation takes a revised definition of the
> whole thing. Nova records the presented information; the familiar
> stuff is stored essentially as it is today (but marked as being in
> some new sort of tentative state), and the grouping, relationship, and
> policy stuff is stored according to a model like the one Debo&Yathi
> wrote. The CREATE operation returns a UUID for the newly created
> top-level group. The invocation of the top-level group CRUD is a
> single operation and it is the first of the two phases. In the second
> phase of a CREATE flow, the client creates individual resources with
> the same calls as are used today, except that each VM instance create
> call is augmented with a pointer into the policy information. That
> pointer consists of (1) the UUID of the relevant top-level group and
> (2) the name used within that group to identify the resource now being
> created. (Obviously we would need resources to be named uniquely
> among all the things ultimately contained anywhere in the same
> top-level group. That could be done, e.g., with path names and a
> requirement only that siblings have distinct names. Or we could
> simply require that names be unique without mandating any particular
> structure. We could call them IDs rather than names.) The way Nova
> handles a VM-create call can now be enhanced to reference and use the
> policy information that is associated with the newly passed policy
> pointer.
>
> The UPDATE flow is similar: first UPDATE the top-level group, then
> update individual resources.
>
> For the definition of a top-level group and all that it contains we
> need some language. I think the obvious answer is an extended version
> of the HOT language. Which is why I have proposed such an extension.
> It is not because I am confused about what the heat engine should do,
> it is because I want something else (the policy-informed scheduler) to
> have an input language with sufficient content. This is the role
> played by "HOT+" in the first of my two pictures cited above. The
> same sort of language is needed in the first step of the roadmap,
> where it is only Nova that is policy-informed and scheduling is not
> yet joint --- but at this early step of the roadmap the
> resources+policy language is input to Nova rather than to a separate
> holistic infrastructure scheduler.
>
> Regards,
> Mike
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131011/0bc9fbde/attachment.html>
More information about the OpenStack-dev
mailing list