<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Long-story short, sounds like we do

      have the same concerns here in Climate.<br>

      <br>

      I'll be present at the Summit, any chance to do an unconference

      meeting in between all parties ?<br>

      <br>

      Thanks,<br>

      -Sylvain<br>

      <br>

      Le 11/10/2013 08:25, Mike Spreitzer a écrit :<br>

    </div>

    <blockquote

cite="mid:OF62780C6D.71AA2394-ON85257C01.001CA787-85257C01.00234B8E@us.ibm.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <font face="sans-serif" size="2">Regarding Alex's question of

        which component

        does holistic infrastructure scheduling, I hesitate to simply

        answer "heat".

         Heat is about orchestration, and infrastructure scheduling is

        another

        matter.  I have attempted to draw pictures to sort this out, see

      </font><a moz-do-not-send="true"

href="https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U"><font

          face="sans-serif" size="2">https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U</font></a><font

        face="sans-serif" size="2">

        and </font><a moz-do-not-send="true"

href="https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g"><font

          face="sans-serif" size="2">https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g</font></a><font

        face="sans-serif" size="2">

        .  In those you will see that I identify holistic infrastructure

        scheduling

        as separate functionality from infrastructure orchestration (the

        main job

        of today's heat engine) and also separate from software

        orchestration concerns.

         However, I also see a close relationship between holistic

        infrastructure

        scheduling and heat, as should be evident in those pictures too.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">Alex made a remark about the

        needed

        inputs, and I agree but would like to expand a little on the

        topic.  One

        thing any scheduler needs is knowledge of the amount, structure,

        and capacity

        of the hosting thingies (I wish I could say "resources", but

        that would be confusing) onto which the workload is to be

        scheduled.  Scheduling

        decisions are made against available capacity.  I think the most

        practical

        way to determine available capacity is to separately track raw

        capacity

        and current (plus already planned!) allocations from that

        capacity, finally

        subtracting the latter from the former.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">In Nova, for example, sensing raw

        capacity

        is handled by the various nova-compute agents reporting that

        information.

         I think a holistic infrastructure scheduler should get that

        information

        from the various individual services (Nova, Cinder, etc) that it

        is concerned

        with (presumably they have it anyway).</font>

      <br>

      <br>

      <font face="sans-serif" size="2">A holistic infrastructure

        scheduler

        can keep track of the allocations it has planned (regardless of

        whether

        they have been executed yet).  However, there may also be

        allocations

        that did not originate in the holistic infrastructure scheduler.

         The

        individual underlying services should be able to report (to the

        holistic

        infrastructure scheduler, even if lowly users are not so

        authorized) all

        the allocations currently in effect.  An accurate union of the

        current

        and planned allocations is what we want to subtract from raw

        capacity to

        get available capacity.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">If there is a long delay between

        planning

        and executing an allocation, there can be nasty surprises from

        competitors

        --- if there are any competitors.  Actually, there can be nasty

        surprises

        anyway.  Any scheduler should be prepared for nasty surprises,

        and

        react by some sensible retrying.  If nasty surprises are rare,

        we

        are pretty much done.  If nasty surprises due to the presence of

        competing

        managers are common, we may be able to combat the problem by

        changing the

        long delay to a short one --- by moving the allocation execution

        earlier

        into a stage that is only about locking in allocations, leaving

        all the

        other work involved in creating virtual resources to later

        (perhaps Climate

        will be good for this).  If the delay between planning and

        executing

        an allocation is short and there are many nasty surprises due to

        competing

        managers, then you have too much competition between managers

        --- don't

        do that.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">Debo wants a simpler nova-centric

        story.

         OK, how about the following.  This is for the first step in

        the roadmap, where scheduling decisions are still made

        independently for

        each VM instance.  For the client/service interface, I think we

        can

        do this with a simple clean two-phase interface when traditional

        software

        orchestration is in play, a one-phase interface when slick new

        software

        orchestration is used.  Let me outline the two-phase flow.  We

        extend the Nova API with CRUD operations on VRTs (top-level

        groups).  For

        example, the CREATE operation takes a definition of a top-level

        group and

        all its nested groups, definitions (excepting stuff like

        userdata) of all

        the resources (only VM instances, for now) contained in those

        groups, all

        the relationships among those groups/resources, and all the

        applications

        of policy to those groups, resources, and relationships.  This

        is

        a rest-style interface; the CREATE operation takes a definition

        of the

        thing (a top-level group and all that it contains) being

        created; the UPDATE

        operation takes a revised definition of the whole thing.  Nova

        records

        the presented information; the familiar stuff is stored

        essentially as

        it is today (but marked as being in some new sort of tentative

        state),

        and the grouping, relationship, and policy stuff is stored

        according to

        a model like the one Debo&Yathi wrote.  The CREATE operation

        returns

        a UUID for the newly created top-level group.  The invocation of

        the

        top-level group CRUD is a single operation and it is the first

        of the two

        phases.  In the second phase of a CREATE flow, the client

        creates

        individual resources with the same calls as are used today,

        except that

        each VM instance create call is augmented with a pointer into

        the policy

        information.  That pointer consists of (1) the UUID of the

        relevant

        top-level group and (2) the name used within that group to

        identify the

        resource now being created.  (Obviously we would need resources

        to

        be named uniquely among all the things ultimately contained

        anywhere in

        the same top-level group.  That could be done, e.g., with path

        names

        and a requirement only that siblings have distinct names.  Or we

        could

        simply require that names be unique without mandating any

        particular structure.

         We could call them IDs rather than names.)  The way Nova

        handles

        a VM-create call can now be enhanced to reference and use the

        policy information

        that is associated with the newly passed policy pointer.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">The UPDATE flow is similar: first

        UPDATE

        the top-level group, then update individual resources.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">For the definition of a top-level

        group

        and all that it contains we need some language.  I think the

        obvious

        answer is an extended version of the HOT language.  Which is why

        I

        have proposed such an extension.  It is not because I am

        confused

        about what the heat engine should do, it is because I want

        something else

        (the policy-informed scheduler) to have an input language with

        sufficient

        content.  This is the role played by "HOT+" in the first

        of my two pictures cited above.  The same sort of language is

        needed

        in the first step of the roadmap, where it is only Nova that is

        policy-informed

        and scheduling is not yet joint --- but at this early step of

        the roadmap

        the resources+policy language is input to Nova rather than to a

        separate

        holistic infrastructure scheduler.</font>

      <br>

      <br>

      <font face="sans-serif" size="2">Regards,</font>

      <br>

      <font face="sans-serif" size="2">Mike</font>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

OpenStack-dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>

<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>