<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Long-story short, sounds like we do
have the same concerns here in Climate.<br>
<br>
I'll be present at the Summit, any chance to do an unconference
meeting in between all parties ?<br>
<br>
Thanks,<br>
-Sylvain<br>
<br>
Le 11/10/2013 08:25, Mike Spreitzer a écrit :<br>
</div>
<blockquote
cite="mid:OF62780C6D.71AA2394-ON85257C01.001CA787-85257C01.00234B8E@us.ibm.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<font face="sans-serif" size="2">Regarding Alex's question of
which component
does holistic infrastructure scheduling, I hesitate to simply
answer "heat".
Heat is about orchestration, and infrastructure scheduling is
another
matter. I have attempted to draw pictures to sort this out, see
</font><a moz-do-not-send="true"
href="https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U"><font
face="sans-serif" size="2">https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U</font></a><font
face="sans-serif" size="2">
and </font><a moz-do-not-send="true"
href="https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g"><font
face="sans-serif" size="2">https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g</font></a><font
face="sans-serif" size="2">
. In those you will see that I identify holistic infrastructure
scheduling
as separate functionality from infrastructure orchestration (the
main job
of today's heat engine) and also separate from software
orchestration concerns.
However, I also see a close relationship between holistic
infrastructure
scheduling and heat, as should be evident in those pictures too.</font>
<br>
<br>
<font face="sans-serif" size="2">Alex made a remark about the
needed
inputs, and I agree but would like to expand a little on the
topic. One
thing any scheduler needs is knowledge of the amount, structure,
and capacity
of the hosting thingies (I wish I could say "resources", but
that would be confusing) onto which the workload is to be
scheduled. Scheduling
decisions are made against available capacity. I think the most
practical
way to determine available capacity is to separately track raw
capacity
and current (plus already planned!) allocations from that
capacity, finally
subtracting the latter from the former.</font>
<br>
<br>
<font face="sans-serif" size="2">In Nova, for example, sensing raw
capacity
is handled by the various nova-compute agents reporting that
information.
I think a holistic infrastructure scheduler should get that
information
from the various individual services (Nova, Cinder, etc) that it
is concerned
with (presumably they have it anyway).</font>
<br>
<br>
<font face="sans-serif" size="2">A holistic infrastructure
scheduler
can keep track of the allocations it has planned (regardless of
whether
they have been executed yet). However, there may also be
allocations
that did not originate in the holistic infrastructure scheduler.
The
individual underlying services should be able to report (to the
holistic
infrastructure scheduler, even if lowly users are not so
authorized) all
the allocations currently in effect. An accurate union of the
current
and planned allocations is what we want to subtract from raw
capacity to
get available capacity.</font>
<br>
<br>
<font face="sans-serif" size="2">If there is a long delay between
planning
and executing an allocation, there can be nasty surprises from
competitors
--- if there are any competitors. Actually, there can be nasty
surprises
anyway. Any scheduler should be prepared for nasty surprises,
and
react by some sensible retrying. If nasty surprises are rare,
we
are pretty much done. If nasty surprises due to the presence of
competing
managers are common, we may be able to combat the problem by
changing the
long delay to a short one --- by moving the allocation execution
earlier
into a stage that is only about locking in allocations, leaving
all the
other work involved in creating virtual resources to later
(perhaps Climate
will be good for this). If the delay between planning and
executing
an allocation is short and there are many nasty surprises due to
competing
managers, then you have too much competition between managers
--- don't
do that.</font>
<br>
<br>
<font face="sans-serif" size="2">Debo wants a simpler nova-centric
story.
OK, how about the following. This is for the first step in
the roadmap, where scheduling decisions are still made
independently for
each VM instance. For the client/service interface, I think we
can
do this with a simple clean two-phase interface when traditional
software
orchestration is in play, a one-phase interface when slick new
software
orchestration is used. Let me outline the two-phase flow. We
extend the Nova API with CRUD operations on VRTs (top-level
groups). For
example, the CREATE operation takes a definition of a top-level
group and
all its nested groups, definitions (excepting stuff like
userdata) of all
the resources (only VM instances, for now) contained in those
groups, all
the relationships among those groups/resources, and all the
applications
of policy to those groups, resources, and relationships. This
is
a rest-style interface; the CREATE operation takes a definition
of the
thing (a top-level group and all that it contains) being
created; the UPDATE
operation takes a revised definition of the whole thing. Nova
records
the presented information; the familiar stuff is stored
essentially as
it is today (but marked as being in some new sort of tentative
state),
and the grouping, relationship, and policy stuff is stored
according to
a model like the one Debo&Yathi wrote. The CREATE operation
returns
a UUID for the newly created top-level group. The invocation of
the
top-level group CRUD is a single operation and it is the first
of the two
phases. In the second phase of a CREATE flow, the client
creates
individual resources with the same calls as are used today,
except that
each VM instance create call is augmented with a pointer into
the policy
information. That pointer consists of (1) the UUID of the
relevant
top-level group and (2) the name used within that group to
identify the
resource now being created. (Obviously we would need resources
to
be named uniquely among all the things ultimately contained
anywhere in
the same top-level group. That could be done, e.g., with path
names
and a requirement only that siblings have distinct names. Or we
could
simply require that names be unique without mandating any
particular structure.
We could call them IDs rather than names.) The way Nova
handles
a VM-create call can now be enhanced to reference and use the
policy information
that is associated with the newly passed policy pointer.</font>
<br>
<br>
<font face="sans-serif" size="2">The UPDATE flow is similar: first
UPDATE
the top-level group, then update individual resources.</font>
<br>
<br>
<font face="sans-serif" size="2">For the definition of a top-level
group
and all that it contains we need some language. I think the
obvious
answer is an extended version of the HOT language. Which is why
I
have proposed such an extension. It is not because I am
confused
about what the heat engine should do, it is because I want
something else
(the policy-informed scheduler) to have an input language with
sufficient
content. This is the role played by "HOT+" in the first
of my two pictures cited above. The same sort of language is
needed
in the first step of the roadmap, where it is only Nova that is
policy-informed
and scheduling is not yet joint --- but at this early step of
the roadmap
the resources+policy language is input to Nova rather than to a
separate
holistic infrastructure scheduler.</font>
<br>
<br>
<font face="sans-serif" size="2">Regards,</font>
<br>
<font face="sans-serif" size="2">Mike</font>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OpenStack-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>