<tt><font size=2>Jay Pipes <jaypipes@gmail.com> wrote on 04/25/2014
06:28:38 PM:<br>
<br>
> On Fri, 2014-04-25 at 22:00 +0000, Day, Phil wrote:<br>
> > Hi Jay,<br>
> > <br>
> > I'm going to disagree with you on this one, because:<br>
> <br>
> No worries, Phil, I expected some dissention and I completely appreciate<br>
> your feedback and perspective :)</font></tt>
<br>
<br><tt><font size=2>I myself sit between the two camps on this one. I
share Jay's unhappiness with server groups, as they are today. However,
I see an evolutionary path forward from today's server groups to something
that makes much more sense to me and my colleagues. I do not see
as clear a path forward from Jay's proposal, but am willing to think more
about that. I will start by outlining where I want to go, and then
address the specific points that have been raised in this email thread
so far.</font></tt>
<br>
<br><tt><font size=2>I would like to see the OpenStack architecture have
a place for what I have been calling holistic scheduling. That is
making a simultaneous scheduling decision about a whole collection of virtual
resources of various types (Nova VMs, Cinder storage volumes, network bandwidth,
...), taking into account a rich composite policy statement. This
is not just a pipe dream, my group has been doing this for years. What
we are struggling with is finding an evolutionary path to a place where
it can be done in an OpenStack context. One part of the struggle
is due to the fact that in our previous work the part that is analogous
to Heat is not optional, while in OpenStack Heat is most definitely optional.
Some of the things I have written in the past have not clearly separated
scheduling and Heat and left Heat optional, but please rest assured that
I am making no proposal now to violate those things. I see scheduling
and orchestration as distinct functions; the potential for confusion arises
because (a) holistic scheduling needs input that has some similarity to
what you see in a Heat template today and (b) making the scheduling simultaneous
requires moving it from its current place (downstream from orchestration)
to an earlier place (upstream from orchestration).</font></tt>
<br>
<br><tt><font size=2>The OpenStack community has historically used the
word "scheduling" to refer to placement problems, always in the
time-invariant now-and-forseeable future, and I am following that usage
here. Other communities consider "scheduling" to also include
interesting variation over time, but I am not trying to bring that into
this debate. (Nor am I denying its interest and value, I am just
trying to keep this discussion focused.)</font></tt>
<br>
<br><tt><font size=2>The discussion in this email thread has recognized
that scheduler hints are applied only at creation time today, but it has
already been noted (e.g., in </font></tt><a href=http://summit.openstack.org/cfp/details/99><tt><font size=2>http://summit.openstack.org/cfp/details/99</font></tt></a><tt><font size=2>)
that scheduling policy statements should be retained for the lifetime of
the virtual resource. That is true regardless of whether the policy
statements come in through today's server groups, the alternate proposal
from Jay Pipes, or some other alternative or evolution.</font></tt>
<br>
<br><tt><font size=2>I agree with Jay that groups have no inherent connection
to scheduling. My colleagues and I have found grouping to be a useful
technique to make APIs and documents more concise, and we find a top-level
group to be the natural scope for a simultaneous decision. We have
been working example problems with a non-trivial size and amount of structure;
when you get beyond small simple examples you see the usefulness of grouping
more clearly. For a couple of examples, see a 3-tier web application
in </font></tt><a href=https://docs.google.com/drawings/d/1nridrUUwNaDrHQoGwSJ_KXYC7ik09wUuV3vXw1MyvlY><tt><font size=2>https://docs.google.com/drawings/d/1nridrUUwNaDrHQoGwSJ_KXYC7ik09wUuV3vXw1MyvlY</font></tt></a><tt><font size=2>
and a deployment of an IBM product called "Connections" in </font></tt><a href=https://docs.google.com/file/d/0BypF9OutGsW3ZUYwYkNjZGJFejQ><tt><font size=2>https://docs.google.com/file/d/0BypF9OutGsW3ZUYwYkNjZGJFejQ</font></tt></a><tt><font size=2>
(this latter example has been shorn of its networking policies, and is
a literal abstract of something we did using software that could not cope
with policies applied directly to virtual resources, so some of its groups
are not well motivated --- but others *are*). The groups are handy
for making it possible to draw pictures without too many lines, and write
documents that are readably concise. But everything said with groups
could be said without groups, if we allowed policy statements to be placed
on virtual resources and on pairs of virtual resources --- it would just
take a heck of a lot more policy statements.</font></tt>
<br>
<br><tt><font size=2>If you want to make a simultaneous decision about
several virtual resources, you need a description of all those virtual
resources up-front. So even in a totally Heat-free environment you
find yourself wanting something that looks like a document or data structure
describing multiple virtual resources --- and the policies that apply to
them, and thus also the groups that allow for concise applications of policies;
note also that the whole set of virtual resources involved is a group.</font></tt>
<br>
<br><tt><font size=2>When you have an example of non-trivial size and structure,
you generally do not want to make a change by a collection of atomic edits,
each individually scheduled. Rather you want to state the new set
of virtual resources and policies that you want to move to, allowing a
simultaneous decision about the new placement solution.</font></tt>
<br>
<br><tt><font size=2>To get where I want to go can be done by evolutionary
steps forward from today's server groups. There are four fairly independent
dimensions in which the evolution can proceed. One is to go from
today's sequential decision-making to simultaneous decision-making; I am
drafting a blueprint on that now (</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/simultaneous-server-group"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/simultaneous-server-group</font></tt></a><tt><font size=2>).
Another is to expand beyond Nova. Another is to allow nested
groups. Another dimension is to expand and refine the catalog of
policy types. In older documents (in the particulars in </font></tt><a href=https://wiki.openstack.org/wiki/Heat/PolicyExtension><tt><font size=2>https://wiki.openstack.org/wiki/Heat/PolicyExtension</font></tt></a><tt><font size=2>
--- do not be distracted by the Heat context, the policy catalog is a scheduling
issue --- and in the generalities in </font></tt><a href="https://docs.google.com/document/d/17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA"><tt><font size=2>https://docs.google.com/document/d/17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA</font></tt></a><tt><font size=2>)
you see more kinds of policies and the idea that policies may have parameters.
For example, co-location (a more precise formulation of "affinity",
it clearly says we are talking about placement rather than some other sort
of affinity --- such as networking, which has its own policy types) takes
a parameter indicating the level of the physical hierarchy at which the
placement should be the same. You also see the idea that a policy
statement can be shaded as either a hard requirement or a soft preference.</font></tt>
<br>
<br><tt><font size=2>In my own group's work, and in the joint proposal
with folks from Cisco and VMware, we stipulated that the groups nest into
a tree. From my own group's perspective, that is merely a matter
of conservatism --- we think it might make some implementations easier,
and it has been an acceptable restriction for the examples we have worked.
I am not strongly wedded to that restriction. The placement
technology that we have in mind (transforming to a constrained optimization
problem) does not require that restriction. If we lifted that restriction,
defining a group by an arbitrary predicate (tag match, or whatever) would
be an acceptable way of expressing grouping. We could even keep the
restriction to a tree of groups while defining groups by predicates ---
the restriction would be a restriction on the predicates.</font></tt>
<br>
<br><tt><font size=2>If we were to start instead from the proposal of Jay
Pipes, the same four dimensions of evolution apply. To get from sequential
to simultaneous decision-making requires an up-front statement of scope
(i.e., member descriptions) and policy (for which grouping would help),
rather than the at-creation-time statements. As mentioned above, the grouping
could continue to be expressed by predicates rather than explicit statements
of membership. If we recognize that we are embarking on a general
program of expanding and refining the policy language then particular command
line syntax like "--not-near-tag $TAG" would probably change
to something generic like "--constraint anti-co-location:level=rack
--between rsc_or_grp_1 --and rsc_or_grp2" (and the API has a corresponding
issue). The expansion beyond Nova has issues that seem to me to be
pretty orthogonal to this debate over the evolutionary starting point.</font></tt>
<br><tt><font size=2><br>
> > i) This is a feature that was discussed in at least one if not
two<br>
> Design Summits and went through a long review period, it wasn't one
<br>
> of those changes that merged in 24 hours before people could take
a <br>
> good look at it.<br>
> <br>
> Completely understood. That still doesn't mean we can't propose to
get<br>
> rid of it early instead of letting it sit around when an alternate<br>
> implementation would be better for the user of OpenStack.</font></tt>
<br>
<br><tt><font size=2>I also tend to favor looking ahead to validate that
we are headed in a good direction. That can conflict with the focus
on quickly making incremental improvements --- if we allow ourselves to
suffer analysis paralysis. I hope a limited discussion can lead to
some consensus on direction, not seriously preventing taking the first
small steps soon. However, in the case of scheduling, we are already
queued up behind the scheduler forklift and no-db-scheduler, so there is
no danger of imminent progress on my evolutionary program.</font></tt>
<br><tt><font size=2><br>
> > Whatever you feel about the implementation, it is
now in the <br>
> API and we should assume that people have started coding against it.</font></tt>
<br>
<br><tt><font size=2>Yes, we should support backwards compatibility in
general. And in this particular case, there may be no immediate conflict.
If we decide we prefer Jay's way of expressing these concepts, we
can retain support for the old way of expressing grouping and policy too
(hopefully with a unified representation underneath). That only leaves
us with another general problem in interface evolution: when and how to
delete old stuff that everybody should eventually stop using.</font></tt>
<br><tt><font size=2><br>
> ...<br>
> > I don't think it gives any credibility to Openstack as
a <br>
> platform if we yank features back out just after they've landed.<br>
> <br>
> Perhaps not, though I think we have less credibility if we don't<br>
> recognize when a feature isn't implemented with users in mind and
leave<br>
> it in the code base to the detriment and confusion of users. We<br>
> absolutely must, IMO, as a community, be able to say "this isn't
right"<br>
> and have a path for changing or removing something.<br>
> <br>
> If that path is deprecation vs outright removal, so be it, I'd be
cool<br>
> with that. I'd just like to nip this anti-feature in the bud early
so<br>
> that it doesn't become the next "feature" like file-injection
to persist<br>
> in Nova well after its time has come and passed.</font></tt>
<br>
<br><tt><font size=2>I am no mind reader, but I suspect the designers of
server groups had users in mind. But just having them in mind is
not really adequate; a serious approach would be to involve actual users
in evaluating a design proposal before it proceeds. Remember the
calls for OpenStack to be more user-driven?</font></tt>
<br><tt><font size=2><br>
> > ii) Sever Group - It's a way of defining a group of servers,
and <br>
> the initial thing (only thing right now) you can define for such a
<br>
> group is the affinity or anti-affinity for scheduling.<br>
> <br>
> We already had ways of defining groups of servers. This new "feature"<br>
> doesn't actually define a group of servers. It defines a policy, which<br>
> is not particularly useful, as it's something that is better specified<br>
> at the time of launching.</font></tt>
<br>
<br><tt><font size=2>As I mentioned above, I want to do stuff (i.e., schedule)
with groups before the members are actually created/udpated.<br>
<br>
> > Maybe in time we'll add other group properties or operations
- <br>
> like "delete all the servers in a group" (I know some QA
folks that <br>
> would love to have that feature).<br>
> <br>
> We already have the ability to define a group of servers using key=value<br>
> tags. Deleting all servers in a group is a three-line bash script
that<br>
> loops over the results of a nova list command and calls nova delete.<br>
> Trust me, I've done group deletes in this way many times.<br>
> <br>
> > I don't see why it shouldn't be possible to have a server
group <br>
> that doesn't have a scheduling policy associated to it.<br>
> <br>
> I don't think the grouping of servers should have *anything* to do
with<br>
> scheduling :) That's the point of my proposal. Servers can and should
be<br>
> grouped using simple tags or key=value pair tags.<br>
> <br>
> The grouping of servers together doesn't have anything of substance
to<br>
> do with scheduling policies.</font></tt>
<br>
<br><tt><font size=2>Right, it just allows more concise statements of policies
AND is a kind of scope where you can apply simultaneous decision-making.</font></tt>
<br><tt><font size=2><br>
> <br>
> > I don't see any Cognitive dissonance here
- I think your just <br>
> assuming that the only reason for being able to group servers is for<br>
> scheduling.<br>
> <br>
> Again, I don't think scheduling and grouping of servers has anything
to<br>
> do with each other, thus my proposal to remove the relationship between<br>
> groups of servers and scheduling policies, which is what the existing<br>
> server group API and implementation does.<br>
> <br>
> > iii) If the issue is that you can't add or remove servers from
a <br>
> group, then why don't we add those operations to the API (you could
<br>
> add a server to a group providing doing so doesn't break any
policy<br>
> that might be associated with the group). </font></tt>
<br>
<br><tt><font size=2>In fact you see some of this already in the blueprint
for server groups (</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension</font></tt></a><tt><font size=2>)
--- look all the way through its whiteboard.<br>
<br>
> We already have this ability today, thus my proposal to get rid of<br>
> server groups.<br>
> <br>
> > Seems like a useful addition to me.<br>
> <br>
> It's an addition that isn't needed, as we already have this today.<br>
> <br>
> > iv) Since the user created the group, and chose a name for it
that<br>
> is presumably meaningful, then I don't understand why you think "--<br>
> group XXX" isn't going to be meaningful to that same user ?<br>
> <br>
> See point above about removing the unnecessary relationship between<br>
> grouping of servers and scheduling policies.<br>
> <br>
> > So I think there are a bunch of API operations missing, but I
<br>
> don't see any advantage in throwing away what's now in place and <br>
> replacing it with a tag mechanism that basically says "everything
<br>
> with this tag is in a sort of group".<br>
> <br>
> We already have the tag group mechanism in place, that's kind of what<br>
> I've been saying...<br>
</font></tt>
<br><tt><font size=2>Regards,</font></tt>
<br><tt><font size=2>Mike</font></tt>