<font size=2 face="sans-serif">Following is my reaction to the last few

hours of discussion.</font>

<br>

<br><font size=2 face="sans-serif">Russell Bryant wrote "</font><tt><font size=2>Nova

calling heat to orchestrate Nova seems fundamentally wrong".</font></tt><font size=2 face="sans-serif">

 I am not totally happy about this either, but would you be OK with

Nova orchestrating Nova?  To me, that seems worse --- duplicating

functionality we already have in Heat.  The way I see it, we have

to decide how cope with the inescapable fact that orchestration is downstream

from joint decision making.  I see no better choices than: (1) a 1-stage

API in which the client presents the whole top-level group and is done,

or (2) a 2-stage API in which the client first presents the whole top-level

group and second proceeds to orchestrate the creations of the resources

in that group.  BTW, when we go holistic, (1) will look less offensive:

there will be a holistic infrastructure scheduler doing the joint decision

making first, not one of the individual services, and that is followed

by orchestration of the individual resources.  If we took Alex Glikson's

suggestion and started holistic, we would not be so upset on this issue.</font>

<br>

<br><font size=2 face="sans-serif">Alex also wrote:</font>

<br><tt><font size=2>``I wonder whether it is possible to find an approach

that takes into account cross-resource placement considerations (VM-to-VM

communicating over the application network, or VM-to-volume communicating

over storage network), but does not require delivering all the intimate

details of the entire environment to a single place -- which probably can

not be either of Nova/Cinder/Neutron/etc.. but can we still use the individual

schedulers in each of them with partial view of the environment to drive

a placement decision which is consistently better than random?''</font></tt>

<br>

<br><font size=2 face="sans-serif">I think you could create a cross-scheduler

protocol that would accomplish joint placement decision making --- but

would not want to.  It would involve a lot of communication, and the

subject matter of that communication would be most of what you need in

a centralized placement solver anyway.  You do not need "all

the intimate details", just the bits that are essential to making

the placement decision.</font>

<br>

<br><font size=2 face="sans-serif">Reacting to Andrew Lasky's note, Chris

Friesen noted:</font>

<br><font size=2 face="sans-serif">``</font><tt><font size=2>As soon as

we start trying to do placement logic outside of Nova it <br>

becomes trickier to deal with race conditions when competing against <br>

other API users trying to acquire resources at the same time.</font></tt><font size=2 face="sans-serif">''</font>

<br>

<br><font size=2 face="sans-serif">I have two reactions.  The simpler

one is: we can avoid this problem if we simply route all placement problems

(either all placement problems for Compute, or all placement problems for

a larger set of services) though one thing that decides and commits allocations.

 My other reaction is: we will probably want multi-engine.  That

is, the option to run several placement solvers concurrently --- with optimistic

concurrency control.  That presents essentially the same problem as

Chris noted.  As Yathi noted in one of his responses, this can be

handled by appropriate implementation structure.  In the spring I

worked out a multi-engine design for my group's old code.  The conclusion

I reached is that after a placement engine finds a solution, you want an

essentially ACID transaction that (1) checks that the solution is still

valid and, if so, (2) makes the allocations in that solution.</font>

<br>

<br><font size=2 face="sans-serif">Yathi wrote that the 2-stage API creates

race conditions, but I do not see that.  As we are starting with Nova

only, in the first of the two stages Nova can both decide and commit the

allocations in one transaction; the second stage just picks up and uses

the allocations made in the first stage.</font>

<br>

<br><font size=2 face="sans-serif">Alex Glikson asked why not go directly

to holistic if there is no value in doing Nova-only.  Yathi replied

to that concern, and let me add some notes.  I think there *are* scenarios

in which doing Nova-only joint policy-based scheduling is advantageous.

 For example, if the storage is in SAN or NAS then you do not have

a strong interaction between scheduling compute and storage so you do not

need holistic scheduling to get good availability.  I know some organizations

build their datacenters that way, with full cross-sectional bandwidth between

the compute and storage, because (among other things) it makes that simplification.

 Another thing that can be done with joint policy-based scheduling

is minimize license costs for certain IBM software.  That software

is licensed based on how many cores the software has access to, so in a

situation with hyperthreading or overcommitment the license cost can depend

on how the VM instances are arranged among hosts.</font>

<br>

<br><font size=2 face="sans-serif">Yathi replied to Khanh-Toan's remark

about edge policies, but I suspect there was a misunderstanding.  I

think the critique concerns this part of the input:</font>

<br>

<br><font size=2 face="sans-serif">  "policies" : [ {</font>

<br><font size=2 face="sans-serif">    "edge" : "http-app-edge-1",</font>

<br><font size=2 face="sans-serif">    "policy_uuid"

: "some-policy-uuid-2",</font>

<br><font size=2 face="sans-serif">    "type" : "edge",</font>

<br><font size=2 face="sans-serif">    "policy_id"

: 33333</font>

<br><font size=2 face="sans-serif">  } ],</font>

<br><font size=2 face="sans-serif">  "edges" : [ {</font>

<br><font size=2 face="sans-serif">    "r_member" :

"app-server-group-1",</font>

<br><font size=2 face="sans-serif">    "l_member" :

"http-server-group-1",</font>

<br><font size=2 face="sans-serif">    "name" : "http-app-edge-1"</font>

<br><font size=2 face="sans-serif">  } ],</font>

<br>

<br><font size=2 face="sans-serif">That is, the top-level group contains

a "policies" section that refers to one of the edges, while the

edges are defined in a different section.  I, and I think Khanh, would

find it more natural for the edge definition to inline its references to

policies (yes, we understand these are only references).  In the example,

it might look like this:</font>

<br>

<br><font size=2 face="sans-serif">  "edges" : [ {</font>

<br><font size=2 face="sans-serif">    "r_member" :

"app-server-group-1",</font>

<br><font size=2 face="sans-serif">    "l_member" :

"http-server-group-1",</font>

<br><font size=2 face="sans-serif">    "name" : "http-app-edge-1",</font>

<br><font size=2 face="sans-serif">    "policies : [ {</font>

<br><font size=2 face="sans-serif">      "policy_uuid"

: "some-policy-uuid-2",</font>

<br><font size=2 face="sans-serif">      "policy_id"

: 33333</font>

<br><font size=2 face="sans-serif">    } ],</font>

<br><font size=2 face="sans-serif">  } ],</font>

<br>

<br><font size=2 face="sans-serif">By writing the policy references right

where they apply, it is easier to understand and it is shorter (we do not

need the "type" and "edge" fields to identify the context).</font>

<br>

<br><font size=2 face="sans-serif">BTW, do we really want to ask the client

to supply both the policy_id and the policy_uuid in a policy reference?

 Isn't one of those sufficient?</font>

<br>

<br><font size=2 face="sans-serif">Regards,</font>

<br><font size=2 face="sans-serif">Mike</font>