<font size=2 face="sans-serif">I've read up on recent goings-on in the

scheduler subgroup, and have some thoughts to contribute.</font>

<br>

<br><font size=2 face="sans-serif">But first I must admit that I am still

a newbie to OpenStack, and still am missing some important clues.  One

thing that mystifies me is this: I see essentially the same thing, which

I have generally taken to calling holistic scheduling, discussed in two

mostly separate contexts: (1) the (nova) scheduler context, and (2) the

ambitions for heat.  What am I missing?</font>

<br>

<br><font size=2 face="sans-serif">I have read the Unified Resource Placement

Module document (at </font><a href="https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#"><font size=2 face="sans-serif">https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#</font></a><font size=2 face="sans-serif">)

and NovaSchedulerPerspective document (at </font><a href="https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu"><font size=2 face="sans-serif">https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu</font></a><font size=2 face="sans-serif">).

 My group already has running code along these lines, and thoughts

for future improvements, so I'll mention some salient characteristics.

 I have read the etherpad at </font><a href="https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions"><font size=2 face="sans-serif">https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions</font></a><font size=2 face="sans-serif">

- and I hope my remarks will help fit these topics together.</font>

<br>

<br><font size=2 face="sans-serif">Our current code uses one long-lived

process to make placement decisions.  The information it needs to

do this job is pro-actively maintained in its memory.  We are planning

to try replacing this one process with a set of equivalent processes, not

sure how well it will work out (we are a research group).</font>

<br>

<br><font size=2 face="sans-serif">We make a distinction between desired

state, target state, and observed state.  The desired state comes

in through REST requests, each giving a full virtual resource topology

(VRT).  A VRT includes constraints that affect placement, but does

not include actual placement decisions.  Those are made by what we

call the placement agent.  Yes, it is separate from orchestration

(even in the first architecture figure in the u-rpm document the orchestration

is separate --- the enclosing box does not abate the essential separateness).

 In our architecture, orchestration is downstream from placement (as

in u-rpm).  The placement agent produces target state, which is essentially

desired state augmented by placement decisions.  Observed state is

what comes from the lower layers (Software Defined Compute, Storage, and

Network).  We mainly use OpenStack APIs for the lower layers, and

have added a few local extensions to make the whole story work.</font>

<br>

<br><font size=2 face="sans-serif">The placement agent judges available

capacity by subtracting current allocations from raw capacity.  The

placement agent maintains in its memory a derived thing we call effective

state; the allocations in effective state are the union of the allocations

in target state and the allocations in observed state.  Since the

orchestration is downstream, some of the planned allocations are not in

observed state yet.  Since other actors can use the underlying cloud,

and other weird sh*t happens, not all the allocations are in target state.

 That's why placement is done against the union of the allocations.

 This is somewhat conservative, but the alternatives are worse.</font>

<br>

<br><font size=2 face="sans-serif">Note that placement is concerned with

allocations rather than current usage.  Current usage fluctuates much

faster than you would want placement to.  Placement needs to be done

with a long-term perspective.  Of course, that perspective can be

informed by usage information (as well as other sources) --- but it remains

a distinct thing.</font>

<br>

<br><font size=2 face="sans-serif">We consider all our copies of observed

state to be soft --- they can be lost and reconstructed at any time, because

the true source is the underlying cloud.  Which is not to say that

reconstructing a copy is cheap.  We prefer making incremental updates

as needed, rather than re-reading the whole thing.  One of our local

extensions adds a mechanism by which a client can register to be notified

of changes in the Software Defined Compute area.</font>

<br>

<br><font size=2 face="sans-serif">The target state, on the other hand,

is stored authoritatively by the placement agent in a database.</font>

<br>

<br><font size=2 face="sans-serif">We pose placement as a constrained optimization

problem, with a non-linear objective.  We approximate its solution

with a very generic algorithm; it is easy to add new kinds of constraints

and new contributions to the objective.</font>

<br>

<br><font size=2 face="sans-serif">The core placement problem is about

packing virtual resources into physical containers (e.g., VMs into hosts,

volumes into Cinder backends).  A virtual resource has a demand vector,

and a corresponding container has a capacity vector of the same length.

 For a given container, the sum of the demand vectors of the virtual

resources in that container can not exceed the container's capacity vector

in any dimension.  We can add dimensions as needed to handle the relevant

host/guest characteristics.</font>

<br>

<br><font size=2 face="sans-serif">We are just now working an example where

a Cinder volume can be required to be the only one hosted on whatever Cinder

backend hosts it.  This is exactly analogous to requiring that a VM

(bare metal or otherwise) be the only one hosted by whatever PM hosts it.</font>

<br>

<br><font size=2 face="sans-serif">We favor a fairly expressive language

for stating desired policies and relationships in VRTs.  We think

this is necessary when you move beyond simple examples to more realistic

ones.  We do not favor chopping the cloud up into little pieces due

to inexpressiveness in the VRT language.</font>

<br>

<br><font size=2 face="sans-serif">Regards,</font>

<br><font size=2 face="sans-serif">Mike</font>