<font size=2 face="sans-serif">I've read up on recent goings-on in the
scheduler subgroup, and have some thoughts to contribute.</font>
<br>
<br><font size=2 face="sans-serif">But first I must admit that I am still
a newbie to OpenStack, and still am missing some important clues. One
thing that mystifies me is this: I see essentially the same thing, which
I have generally taken to calling holistic scheduling, discussed in two
mostly separate contexts: (1) the (nova) scheduler context, and (2) the
ambitions for heat. What am I missing?</font>
<br>
<br><font size=2 face="sans-serif">I have read the Unified Resource Placement
Module document (at </font><a href="https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#"><font size=2 face="sans-serif">https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#</font></a><font size=2 face="sans-serif">)
and NovaSchedulerPerspective document (at </font><a href="https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu"><font size=2 face="sans-serif">https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu</font></a><font size=2 face="sans-serif">).
My group already has running code along these lines, and thoughts
for future improvements, so I'll mention some salient characteristics.
I have read the etherpad at </font><a href="https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions"><font size=2 face="sans-serif">https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions</font></a><font size=2 face="sans-serif">
- and I hope my remarks will help fit these topics together.</font>
<br>
<br><font size=2 face="sans-serif">Our current code uses one long-lived
process to make placement decisions. The information it needs to
do this job is pro-actively maintained in its memory. We are planning
to try replacing this one process with a set of equivalent processes, not
sure how well it will work out (we are a research group).</font>
<br>
<br><font size=2 face="sans-serif">We make a distinction between desired
state, target state, and observed state. The desired state comes
in through REST requests, each giving a full virtual resource topology
(VRT). A VRT includes constraints that affect placement, but does
not include actual placement decisions. Those are made by what we
call the placement agent. Yes, it is separate from orchestration
(even in the first architecture figure in the u-rpm document the orchestration
is separate --- the enclosing box does not abate the essential separateness).
In our architecture, orchestration is downstream from placement (as
in u-rpm). The placement agent produces target state, which is essentially
desired state augmented by placement decisions. Observed state is
what comes from the lower layers (Software Defined Compute, Storage, and
Network). We mainly use OpenStack APIs for the lower layers, and
have added a few local extensions to make the whole story work.</font>
<br>
<br><font size=2 face="sans-serif">The placement agent judges available
capacity by subtracting current allocations from raw capacity. The
placement agent maintains in its memory a derived thing we call effective
state; the allocations in effective state are the union of the allocations
in target state and the allocations in observed state. Since the
orchestration is downstream, some of the planned allocations are not in
observed state yet. Since other actors can use the underlying cloud,
and other weird sh*t happens, not all the allocations are in target state.
That's why placement is done against the union of the allocations.
This is somewhat conservative, but the alternatives are worse.</font>
<br>
<br><font size=2 face="sans-serif">Note that placement is concerned with
allocations rather than current usage. Current usage fluctuates much
faster than you would want placement to. Placement needs to be done
with a long-term perspective. Of course, that perspective can be
informed by usage information (as well as other sources) --- but it remains
a distinct thing.</font>
<br>
<br><font size=2 face="sans-serif">We consider all our copies of observed
state to be soft --- they can be lost and reconstructed at any time, because
the true source is the underlying cloud. Which is not to say that
reconstructing a copy is cheap. We prefer making incremental updates
as needed, rather than re-reading the whole thing. One of our local
extensions adds a mechanism by which a client can register to be notified
of changes in the Software Defined Compute area.</font>
<br>
<br><font size=2 face="sans-serif">The target state, on the other hand,
is stored authoritatively by the placement agent in a database.</font>
<br>
<br><font size=2 face="sans-serif">We pose placement as a constrained optimization
problem, with a non-linear objective. We approximate its solution
with a very generic algorithm; it is easy to add new kinds of constraints
and new contributions to the objective.</font>
<br>
<br><font size=2 face="sans-serif">The core placement problem is about
packing virtual resources into physical containers (e.g., VMs into hosts,
volumes into Cinder backends). A virtual resource has a demand vector,
and a corresponding container has a capacity vector of the same length.
For a given container, the sum of the demand vectors of the virtual
resources in that container can not exceed the container's capacity vector
in any dimension. We can add dimensions as needed to handle the relevant
host/guest characteristics.</font>
<br>
<br><font size=2 face="sans-serif">We are just now working an example where
a Cinder volume can be required to be the only one hosted on whatever Cinder
backend hosts it. This is exactly analogous to requiring that a VM
(bare metal or otherwise) be the only one hosted by whatever PM hosts it.</font>
<br>
<br><font size=2 face="sans-serif">We favor a fairly expressive language
for stating desired policies and relationships in VRTs. We think
this is necessary when you move beyond simple examples to more realistic
ones. We do not favor chopping the cloud up into little pieces due
to inexpressiveness in the VRT language.</font>
<br>
<br><font size=2 face="sans-serif">Regards,</font>
<br><font size=2 face="sans-serif">Mike</font>