<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Hi,</div><div>Can you please join us at the up and coming scheduler meeting. That will give you a chance to bring up the idea's and discuss them with a larger audience.</div><div><a href="https://wiki.openstack.org/wiki/Meetings#Scheduler_Sub-group_meeting">https://wiki.openstack.org/wiki/Meetings#Scheduler_Sub-group_meeting</a></div><div>I think that for the summit it would be a good idea if we could also have at least one session with the Heat folks to see how we can combine efforts.</div><div>Thanks</div><div>Gary</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Mike Spreitzer <<a href="mailto:mspreitz@us.ibm.com">mspreitz@us.ibm.com</a>><br><span style="font-weight:bold">Reply-To: </span> OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br><span style="font-weight:bold">Date: </span> Sunday, September 15, 2013 10:19 AM<br><span style="font-weight:bold">To: </span> OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br><span style="font-weight:bold">Subject: </span> [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse<br></div><div><br></div><div><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div><font size="2" face="sans-serif">I've read up on recent goings-on in the scheduler subgroup, and have some thoughts to contribute.</font><br><br><font size="2" face="sans-serif">But first I must admit that I am still a newbie to OpenStack, and still am missing some important clues. One thing that mystifies me is this: I see essentially the same thing, which I have generally taken to calling holistic
scheduling, discussed in two mostly separate contexts: (1) the (nova) scheduler context, and (2) the ambitions for heat. What am I missing?</font><br><br><font size="2" face="sans-serif">I have read the Unified Resource Placement Module document (at
</font><a href="https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#"><font size="2" face="sans-serif">https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#</font></a><font size="2" face="sans-serif">)
and NovaSchedulerPerspective document (at </font><a href="https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu"><font size="2" face="sans-serif">https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu</font></a><font size="2" face="sans-serif">).
My group already has running code along these lines, and thoughts for future improvements, so I'll mention some salient characteristics. I have read the etherpad at
</font><a href="https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions"><font size="2" face="sans-serif">https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions</font></a><font size="2" face="sans-serif"> - and I hope my remarks will help fit
these topics together.</font> <br><br><font size="2" face="sans-serif">Our current code uses one long-lived process to make placement decisions. The information it needs to do this job is pro-actively maintained in its memory. We are planning to try replacing this one process with a set of equivalent
processes, not sure how well it will work out (we are a research group).</font> <br><br><font size="2" face="sans-serif">We make a distinction between desired state, target state, and observed state. The desired state comes in through REST requests, each giving a full virtual resource topology (VRT). A VRT includes constraints that affect placement,
but does not include actual placement decisions. Those are made by what we call the placement agent. Yes, it is separate from orchestration (even in the first architecture figure in the u-rpm document the orchestration is separate --- the enclosing box does
not abate the essential separateness). In our architecture, orchestration is downstream from placement (as in u-rpm). The placement agent produces target state, which is essentially desired state augmented by placement decisions. Observed state is what
comes from the lower layers (Software Defined Compute, Storage, and Network). We mainly use OpenStack APIs for the lower layers, and have added a few local extensions to make the whole story work.</font><br><br><font size="2" face="sans-serif">The placement agent judges available capacity by subtracting current allocations from raw capacity. The placement agent maintains in its memory a derived thing we call effective state; the allocations in effective state are
the union of the allocations in target state and the allocations in observed state. Since the orchestration is downstream, some of the planned allocations are not in observed state yet. Since other actors can use the underlying cloud, and other weird sh*t
happens, not all the allocations are in target state. That's why placement is done against the union of the allocations. This is somewhat conservative, but the alternatives are worse.</font><br><br><font size="2" face="sans-serif">Note that placement is concerned with allocations rather than current usage. Current usage fluctuates much faster than you would want placement to. Placement needs to be done with a long-term perspective. Of course, that
perspective can be informed by usage information (as well as other sources) --- but it remains a distinct thing.</font><br><br><font size="2" face="sans-serif">We consider all our copies of observed state to be soft --- they can be lost and reconstructed at any time, because the true source is the underlying cloud. Which is not to say that reconstructing a copy is cheap. We prefer
making incremental updates as needed, rather than re-reading the whole thing. One of our local extensions adds a mechanism by which a client can register to be notified of changes in the Software Defined Compute area.</font><br><br><font size="2" face="sans-serif">The target state, on the other hand, is stored authoritatively by the placement agent in a database.</font><br><br><font size="2" face="sans-serif">We pose placement as a constrained optimization problem, with a non-linear objective. We approximate its solution with a very generic algorithm; it is easy to add new kinds of constraints and new contributions to the objective.</font><br><br><font size="2" face="sans-serif">The core placement problem is about packing virtual resources into physical containers (e.g., VMs into hosts, volumes into Cinder backends). A virtual resource has a demand vector, and a corresponding container has a capacity
vector of the same length. For a given container, the sum of the demand vectors of the virtual resources in that container can not exceed the container's capacity vector in any dimension. We can add dimensions as needed to handle the relevant host/guest
characteristics.</font> <br><br><font size="2" face="sans-serif">We are just now working an example where a Cinder volume can be required to be the only one hosted on whatever Cinder backend hosts it. This is exactly analogous to requiring that a VM (bare metal or otherwise) be the only
one hosted by whatever PM hosts it.</font> <br><br><font size="2" face="sans-serif">We favor a fairly expressive language for stating desired policies and relationships in VRTs. We think this is necessary when you move beyond simple examples to more realistic ones. We do not favor chopping the cloud up into
little pieces due to inexpressiveness in the VRT language.</font> <br><br><font size="2" face="sans-serif">Regards,</font> <br><font size="2" face="sans-serif">Mike</font></div></div></span></body></html>