[openstack-dev] [nova] [savanna] Host information for non admin users - from a holistic scheduler
Mike Spreitzer
mspreitz at us.ibm.com
Sun Sep 15 05:14:10 UTC 2013
Alex, my understanding is that the motivation for rack-awareness in Hadoop
is optimizing availability rather than networking. The good news, for
those of us who favor a holistic scheduler, is that it can take both sorts
of things into account when/where desired.
Yes, the case of a public cloud is more difficult than the case of a
private cloud. My understanding of Amazon's attitude, for example, is
that they do not want to give out any bits of information about placement
--- even though there are known techniques to reverse-engineer it, Amazon
does not want to help that along at all. Giving out obscured information
--- some bits but not all --- is still disfavored. Let me give a little
background on how my group deals with placement for availability, then
discuss options for the public cloud.
Our holistic scheduler takes as input something we call a virtual resource
topology (VRT), other people use words like pattern, template,
application, and cluster for such a thing. It is an assembly of virtual
resources that one tenant wants to instantiate. In a VRT the resources
are arranged into a tree of groups, the VRT itself is the root. We use
the groups for concise statements of various sorts, which I will omit here
for the sake of simplicity. As far as direct location constraints are
concern, there is just one primitive thing: it is a relationship between
two virtual resources and it is parameterized by a sense (positive or
negative) and a level in the physical hierarchy (e.g., physical machine
(PM), chassis, rack). Thus: a negative relationship between VM1 and VM2
at the rack level means that VM1 and VM2 must go on different racks; a
positive relationship between VM3 and VM4 at the PM level means those two
VMs must be on the same host. Additionally, each constraint can be hard
or soft: a hard constraint must be satisfied while a soft constraint is a
preference.
Consider the example of six interchangeable VMs (say VM1, VM2, ... VM6)
that should be spread across at least two racks with no more than half the
VMs on any one rack. How to say that with a collection of location
primitives? What we do is establish three rack-level anti-co-location
constraints: one between VM1 and VM2, one between VM3 and VM4, and one
between VM5 and VM6. That is not the most obvious representation. You
might have expected this: nine rack-level anti-co-location constraints,
one for every pair in the outer product between {VM1, VM2, VM3} and {VM4,
VM5, VM6}. Now consider what happens if the physical system has three
racks and room for only two additional VMs on each rack. With the latter
set of constraints, there is no acceptable placement. With the sparser
set that we use, there are allowed placements. In short, an obvious set
of constraints may rule out otherwise acceptable placement.
I see two ways to give guaranteed-accurate rack awareness to Hadoop:
constrain the placement so tightly that you know enough to configure
Hadoop before the placement decision is made, or extract placement
information after the placement decision is made. The public cloud
setting rules out the latter, leaving only the former. This can be done,
at a cost of suffering pattern rejections that would not occur if you did
not have to over-constrain the placement.
One more option is to give up on guaranteed accuracy: prescribe a
placement with sufficient precision to inform Hadoop, and so inform
Hadoop, but make that prescription a preference rather than a hard
constraint. If the actual placement does not fully meet all the
preferences, Hadoop is not informed of the differences and so will suffer
in "non-functional" ways but still get the job done (modulo all those
non-functional considerations, like tolerating a rack crash). When your
preferences are not met, it is because the system is very loaded and your
only choice is between operating in some degraded way or not at all ---
you might as well take the degraded operation.
Regards,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130915/7300f48c/attachment.html>
More information about the OpenStack-dev
mailing list