<font size=2 face="sans-serif">I'll be at the summit too.  Available

Nov 4 if we want to do some prep then.  It will be my first summit,

I am not sure how overbooked my summit time will be.</font>

<br>

<br><font size=2 face="sans-serif">Regards,</font>

<br><font size=2 face="sans-serif">Mike</font>

<br>

<br>

<br>

<br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Sylvain Bauza <sylvain.bauza@bull.net></font>

<br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">OpenStack Development

Mailing List <openstack-dev@lists.openstack.org>, </font>

<br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">Mike Spreitzer/Watson/IBM@IBMUS</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">10/11/2013 08:19 AM</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [openstack-dev]

[scheduler] APIs for Smart Resource Placement - Updated Instance Group

Model and API extension model - WIP Draft</font>

<br>

<hr noshade>

<br>

<br>

<br><font size=3>Long-story short, sounds like we do have the same concerns

here in Climate.<br>

<br>

I'll be present at the Summit, any chance to do an unconference meeting

in between all parties ?<br>

<br>

Thanks,<br>

-Sylvain<br>

<br>

Le 11/10/2013 08:25, Mike Spreitzer a écrit :</font>

<br><font size=2 face="sans-serif">Regarding Alex's question of which component

does holistic infrastructure scheduling, I hesitate to simply answer "heat".

 Heat is about orchestration, and infrastructure scheduling is another

matter.  I have attempted to draw pictures to sort this out, see </font><a href=https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U><font size=2 color=blue face="sans-serif"><u>https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U</u></font></a><font size=2 face="sans-serif">

and </font><a href="https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g"><font size=2 color=blue face="sans-serif"><u>https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g</u></font></a><font size=2 face="sans-serif">

.  In those you will see that I identify holistic infrastructure scheduling

as separate functionality from infrastructure orchestration (the main job

of today's heat engine) and also separate from software orchestration concerns.

 However, I also see a close relationship between holistic infrastructure

scheduling and heat, as should be evident in those pictures too.</font><font size=3>

<br>

</font><font size=2 face="sans-serif"><br>

Alex made a remark about the needed inputs, and I agree but would like

to expand a little on the topic.  One thing any scheduler needs is

knowledge of the amount, structure, and capacity of the hosting thingies

(I wish I could say "resources", but that would be confusing)

onto which the workload is to be scheduled.  Scheduling decisions

are made against available capacity.  I think the most practical way

to determine available capacity is to separately track raw capacity and

current (plus already planned!) allocations from that capacity, finally

subtracting the latter from the former.</font><font size=3> <br>

</font><font size=2 face="sans-serif"><br>

In Nova, for example, sensing raw capacity is handled by the various nova-compute

agents reporting that information.  I think a holistic infrastructure

scheduler should get that information from the various individual services

(Nova, Cinder, etc) that it is concerned with (presumably they have it

anyway).</font><font size=3> <br>

</font><font size=2 face="sans-serif"><br>

A holistic infrastructure scheduler can keep track of the allocations it

has planned (regardless of whether they have been executed yet).  However,

there may also be allocations that did not originate in the holistic infrastructure

scheduler.  The individual underlying services should be able to report

(to the holistic infrastructure scheduler, even if lowly users are not

so authorized) all the allocations currently in effect.  An accurate

union of the current and planned allocations is what we want to subtract

from raw capacity to get available capacity.</font><font size=3> <br>

</font><font size=2 face="sans-serif"><br>

If there is a long delay between planning and executing an allocation,

there can be nasty surprises from competitors --- if there are any competitors.

 Actually, there can be nasty surprises anyway.  Any scheduler

should be prepared for nasty surprises, and react by some sensible retrying.

 If nasty surprises are rare, we are pretty much done.  If nasty

surprises due to the presence of competing managers are common, we may

be able to combat the problem by changing the long delay to a short one

--- by moving the allocation execution earlier into a stage that is only

about locking in allocations, leaving all the other work involved in creating

virtual resources to later (perhaps Climate will be good for this).  If

the delay between planning and executing an allocation is short and there

are many nasty surprises due to competing managers, then you have too much

competition between managers --- don't do that.</font><font size=3> <br>

</font><font size=2 face="sans-serif"><br>

Debo wants a simpler nova-centric story.  OK, how about the following.

 This is for the first step in the roadmap, where scheduling decisions

are still made independently for each VM instance.  For the client/service

interface, I think we can do this with a simple clean two-phase interface

when traditional software orchestration is in play, a one-phase interface

when slick new software orchestration is used.  Let me outline the

two-phase flow.  We extend the Nova API with CRUD operations on VRTs

(top-level groups).  For example, the CREATE operation takes a definition

of a top-level group and all its nested groups, definitions (excepting

stuff like userdata) of all the resources (only VM instances, for now)

contained in those groups, all the relationships among those groups/resources,

and all the applications of policy to those groups, resources, and relationships.

 This is a rest-style interface; the CREATE operation takes a definition

of the thing (a top-level group and all that it contains) being created;

the UPDATE operation takes a revised definition of the whole thing.  Nova

records the presented information; the familiar stuff is stored essentially

as it is today (but marked as being in some new sort of tentative state),

and the grouping, relationship, and policy stuff is stored according to

a model like the one Debo&Yathi wrote.  The CREATE operation returns

a UUID for the newly created top-level group.  The invocation of the

top-level group CRUD is a single operation and it is the first of the two

phases.  In the second phase of a CREATE flow, the client creates

individual resources with the same calls as are used today, except that

each VM instance create call is augmented with a pointer into the policy

information.  That pointer consists of (1) the UUID of the relevant

top-level group and (2) the name used within that group to identify the

resource now being created.  (Obviously we would need resources to

be named uniquely among all the things ultimately contained anywhere in

the same top-level group.  That could be done, e.g., with path names

and a requirement only that siblings have distinct names.  Or we could

simply require that names be unique without mandating any particular structure.

 We could call them IDs rather than names.)  The way Nova handles

a VM-create call can now be enhanced to reference and use the policy information

that is associated with the newly passed policy pointer.</font><font size=3>

<br>

</font><font size=2 face="sans-serif"><br>

The UPDATE flow is similar: first UPDATE the top-level group, then update

individual resources.</font><font size=3> <br>

</font><font size=2 face="sans-serif"><br>

For the definition of a top-level group and all that it contains we need

some language.  I think the obvious answer is an extended version

of the HOT language.  Which is why I have proposed such an extension.

 It is not because I am confused about what the heat engine should

do, it is because I want something else (the policy-informed scheduler)

to have an input language with sufficient content.  This is the role

played by "HOT+" in the first of my two pictures cited above.

 The same sort of language is needed in the first step of the roadmap,

where it is only Nova that is policy-informed and scheduling is not yet

joint --- but at this early step of the roadmap the resources+policy language

is input to Nova rather than to a separate holistic infrastructure scheduler.</font><font size=3>

<br>

</font><font size=2 face="sans-serif"><br>

Regards,</font><font size=3> </font><font size=2 face="sans-serif"><br>

Mike</font><font size=3> <br>

</font>

<br><tt><font size=3>_________________

_________________________

_____<br>

OpenStack-dev mailing list<br>

</font></tt><a href="mailto:OpenStack-dev@lists.openstack.org"><tt><font size=3 color=blue><u>OpenStack-dev@lists.openstack.org</u></font></tt></a><tt><font size=3><br>

</font></tt><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"><tt><font size=3 color=blue><u>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</u></font></tt></a><tt><font size

=3><br>

</font></tt>

<br>

<br>