<font size=2 face="sans-serif">Following is my reaction to the last few
hours of discussion.</font>
<br>
<br><font size=2 face="sans-serif">Russell Bryant wrote "</font><tt><font size=2>Nova
calling heat to orchestrate Nova seems fundamentally wrong".</font></tt><font size=2 face="sans-serif">
I am not totally happy about this either, but would you be OK with
Nova orchestrating Nova? To me, that seems worse --- duplicating
functionality we already have in Heat. The way I see it, we have
to decide how cope with the inescapable fact that orchestration is downstream
from joint decision making. I see no better choices than: (1) a 1-stage
API in which the client presents the whole top-level group and is done,
or (2) a 2-stage API in which the client first presents the whole top-level
group and second proceeds to orchestrate the creations of the resources
in that group. BTW, when we go holistic, (1) will look less offensive:
there will be a holistic infrastructure scheduler doing the joint decision
making first, not one of the individual services, and that is followed
by orchestration of the individual resources. If we took Alex Glikson's
suggestion and started holistic, we would not be so upset on this issue.</font>
<br>
<br><font size=2 face="sans-serif">Alex also wrote:</font>
<br><tt><font size=2>``I wonder whether it is possible to find an approach
that takes into account cross-resource placement considerations (VM-to-VM
communicating over the application network, or VM-to-volume communicating
over storage network), but does not require delivering all the intimate
details of the entire environment to a single place -- which probably can
not be either of Nova/Cinder/Neutron/etc.. but can we still use the individual
schedulers in each of them with partial view of the environment to drive
a placement decision which is consistently better than random?''</font></tt>
<br>
<br><font size=2 face="sans-serif">I think you could create a cross-scheduler
protocol that would accomplish joint placement decision making --- but
would not want to. It would involve a lot of communication, and the
subject matter of that communication would be most of what you need in
a centralized placement solver anyway. You do not need "all
the intimate details", just the bits that are essential to making
the placement decision.</font>
<br>
<br><font size=2 face="sans-serif">Reacting to Andrew Lasky's note, Chris
Friesen noted:</font>
<br><font size=2 face="sans-serif">``</font><tt><font size=2>As soon as
we start trying to do placement logic outside of Nova it <br>
becomes trickier to deal with race conditions when competing against <br>
other API users trying to acquire resources at the same time.</font></tt><font size=2 face="sans-serif">''</font>
<br>
<br><font size=2 face="sans-serif">I have two reactions. The simpler
one is: we can avoid this problem if we simply route all placement problems
(either all placement problems for Compute, or all placement problems for
a larger set of services) though one thing that decides and commits allocations.
My other reaction is: we will probably want multi-engine. That
is, the option to run several placement solvers concurrently --- with optimistic
concurrency control. That presents essentially the same problem as
Chris noted. As Yathi noted in one of his responses, this can be
handled by appropriate implementation structure. In the spring I
worked out a multi-engine design for my group's old code. The conclusion
I reached is that after a placement engine finds a solution, you want an
essentially ACID transaction that (1) checks that the solution is still
valid and, if so, (2) makes the allocations in that solution.</font>
<br>
<br><font size=2 face="sans-serif">Yathi wrote that the 2-stage API creates
race conditions, but I do not see that. As we are starting with Nova
only, in the first of the two stages Nova can both decide and commit the
allocations in one transaction; the second stage just picks up and uses
the allocations made in the first stage.</font>
<br>
<br><font size=2 face="sans-serif">Alex Glikson asked why not go directly
to holistic if there is no value in doing Nova-only. Yathi replied
to that concern, and let me add some notes. I think there *are* scenarios
in which doing Nova-only joint policy-based scheduling is advantageous.
For example, if the storage is in SAN or NAS then you do not have
a strong interaction between scheduling compute and storage so you do not
need holistic scheduling to get good availability. I know some organizations
build their datacenters that way, with full cross-sectional bandwidth between
the compute and storage, because (among other things) it makes that simplification.
Another thing that can be done with joint policy-based scheduling
is minimize license costs for certain IBM software. That software
is licensed based on how many cores the software has access to, so in a
situation with hyperthreading or overcommitment the license cost can depend
on how the VM instances are arranged among hosts.</font>
<br>
<br><font size=2 face="sans-serif">Yathi replied to Khanh-Toan's remark
about edge policies, but I suspect there was a misunderstanding. I
think the critique concerns this part of the input:</font>
<br>
<br><font size=2 face="sans-serif"> "policies" : [ {</font>
<br><font size=2 face="sans-serif"> "edge" : "http-app-edge-1",</font>
<br><font size=2 face="sans-serif"> "policy_uuid"
: "some-policy-uuid-2",</font>
<br><font size=2 face="sans-serif"> "type" : "edge",</font>
<br><font size=2 face="sans-serif"> "policy_id"
: 33333</font>
<br><font size=2 face="sans-serif"> } ],</font>
<br><font size=2 face="sans-serif"> "edges" : [ {</font>
<br><font size=2 face="sans-serif"> "r_member" :
"app-server-group-1",</font>
<br><font size=2 face="sans-serif"> "l_member" :
"http-server-group-1",</font>
<br><font size=2 face="sans-serif"> "name" : "http-app-edge-1"</font>
<br><font size=2 face="sans-serif"> } ],</font>
<br>
<br><font size=2 face="sans-serif">That is, the top-level group contains
a "policies" section that refers to one of the edges, while the
edges are defined in a different section. I, and I think Khanh, would
find it more natural for the edge definition to inline its references to
policies (yes, we understand these are only references). In the example,
it might look like this:</font>
<br>
<br><font size=2 face="sans-serif"> "edges" : [ {</font>
<br><font size=2 face="sans-serif"> "r_member" :
"app-server-group-1",</font>
<br><font size=2 face="sans-serif"> "l_member" :
"http-server-group-1",</font>
<br><font size=2 face="sans-serif"> "name" : "http-app-edge-1",</font>
<br><font size=2 face="sans-serif"> "policies : [ {</font>
<br><font size=2 face="sans-serif"> "policy_uuid"
: "some-policy-uuid-2",</font>
<br><font size=2 face="sans-serif"> "policy_id"
: 33333</font>
<br><font size=2 face="sans-serif"> } ],</font>
<br><font size=2 face="sans-serif"> } ],</font>
<br>
<br><font size=2 face="sans-serif">By writing the policy references right
where they apply, it is easier to understand and it is shorter (we do not
need the "type" and "edge" fields to identify the context).</font>
<br>
<br><font size=2 face="sans-serif">BTW, do we really want to ask the client
to supply both the policy_id and the policy_uuid in a policy reference?
Isn't one of those sufficient?</font>
<br>
<br><font size=2 face="sans-serif">Regards,</font>
<br><font size=2 face="sans-serif">Mike</font>