[openstack-dev] [nova][scheduler] Instance Group Model and APIs - Updated document with an example request payload

Mike Spreitzer mspreitz at us.ibm.com
Wed Oct 30 04:11:04 UTC 2013


Following is my reaction to the last few hours of discussion.

Russell Bryant wrote "Nova calling heat to orchestrate Nova seems 
fundamentally wrong".  I am not totally happy about this either, but would 
you be OK with Nova orchestrating Nova?  To me, that seems worse --- 
duplicating functionality we already have in Heat.  The way I see it, we 
have to decide how cope with the inescapable fact that orchestration is 
downstream from joint decision making.  I see no better choices than: (1) 
a 1-stage API in which the client presents the whole top-level group and 
is done, or (2) a 2-stage API in which the client first presents the whole 
top-level group and second proceeds to orchestrate the creations of the 
resources in that group.  BTW, when we go holistic, (1) will look less 
offensive: there will be a holistic infrastructure scheduler doing the 
joint decision making first, not one of the individual services, and that 
is followed by orchestration of the individual resources.  If we took Alex 
Glikson's suggestion and started holistic, we would not be so upset on 
this issue.

Alex also wrote:
``I wonder whether it is possible to find an approach that takes into 
account cross-resource placement considerations (VM-to-VM communicating 
over the application network, or VM-to-volume communicating over storage 
network), but does not require delivering all the intimate details of the 
entire environment to a single place -- which probably can not be either 
of Nova/Cinder/Neutron/etc.. but can we still use the individual 
schedulers in each of them with partial view of the environment to drive a 
placement decision which is consistently better than random?''

I think you could create a cross-scheduler protocol that would accomplish 
joint placement decision making --- but would not want to.  It would 
involve a lot of communication, and the subject matter of that 
communication would be most of what you need in a centralized placement 
solver anyway.  You do not need "all the intimate details", just the bits 
that are essential to making the placement decision.

Reacting to Andrew Lasky's note, Chris Friesen noted:
``As soon as we start trying to do placement logic outside of Nova it 
becomes trickier to deal with race conditions when competing against 
other API users trying to acquire resources at the same time.''

I have two reactions.  The simpler one is: we can avoid this problem if we 
simply route all placement problems (either all placement problems for 
Compute, or all placement problems for a larger set of services) though 
one thing that decides and commits allocations.  My other reaction is: we 
will probably want multi-engine.  That is, the option to run several 
placement solvers concurrently --- with optimistic concurrency control. 
That presents essentially the same problem as Chris noted.  As Yathi noted 
in one of his responses, this can be handled by appropriate implementation 
structure.  In the spring I worked out a multi-engine design for my 
group's old code.  The conclusion I reached is that after a placement 
engine finds a solution, you want an essentially ACID transaction that (1) 
checks that the solution is still valid and, if so, (2) makes the 
allocations in that solution.

Yathi wrote that the 2-stage API creates race conditions, but I do not see 
that.  As we are starting with Nova only, in the first of the two stages 
Nova can both decide and commit the allocations in one transaction; the 
second stage just picks up and uses the allocations made in the first 
stage.

Alex Glikson asked why not go directly to holistic if there is no value in 
doing Nova-only.  Yathi replied to that concern, and let me add some 
notes.  I think there *are* scenarios in which doing Nova-only joint 
policy-based scheduling is advantageous.  For example, if the storage is 
in SAN or NAS then you do not have a strong interaction between scheduling 
compute and storage so you do not need holistic scheduling to get good 
availability.  I know some organizations build their datacenters that way, 
with full cross-sectional bandwidth between the compute and storage, 
because (among other things) it makes that simplification.  Another thing 
that can be done with joint policy-based scheduling is minimize license 
costs for certain IBM software.  That software is licensed based on how 
many cores the software has access to, so in a situation with 
hyperthreading or overcommitment the license cost can depend on how the VM 
instances are arranged among hosts.

Yathi replied to Khanh-Toan's remark about edge policies, but I suspect 
there was a misunderstanding.  I think the critique concerns this part of 
the input:

  "policies" : [ {
    "edge" : "http-app-edge-1",
    "policy_uuid" : "some-policy-uuid-2",
    "type" : "edge",
    "policy_id" : 33333
  } ],
  "edges" : [ {
    "r_member" : "app-server-group-1",
    "l_member" : "http-server-group-1",
    "name" : "http-app-edge-1"
  } ],

That is, the top-level group contains a "policies" section that refers to 
one of the edges, while the edges are defined in a different section.  I, 
and I think Khanh, would find it more natural for the edge definition to 
inline its references to policies (yes, we understand these are only 
references).  In the example, it might look like this:

  "edges" : [ {
    "r_member" : "app-server-group-1",
    "l_member" : "http-server-group-1",
    "name" : "http-app-edge-1",
    "policies : [ {
      "policy_uuid" : "some-policy-uuid-2",
      "policy_id" : 33333
    } ],
  } ],

By writing the policy references right where they apply, it is easier to 
understand and it is shorter (we do not need the "type" and "edge" fields 
to identify the context).

BTW, do we really want to ask the client to supply both the policy_id and 
the policy_uuid in a policy reference?  Isn't one of those sufficient?

Regards,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131030/d52c7c37/attachment.html>


More information about the OpenStack-dev mailing list