[openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

Gary Kotton gkotton at vmware.com
Sun Sep 15 12:02:15 UTC 2013


Hi,
Can you please join us at the up and coming scheduler meeting. That will give you a chance to bring up the idea's and discuss them with a larger audience.
https://wiki.openstack.org/wiki/Meetings#Scheduler_Sub-group_meeting
I think that for the summit it would be a good idea if we could also have at least one session with the Heat folks to see how we can combine efforts.
Thanks
Gary

From: Mike Spreitzer <mspreitz at us.ibm.com<mailto:mspreitz at us.ibm.com>>
Reply-To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Sunday, September 15, 2013 10:19 AM
To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

I've read up on recent goings-on in the scheduler subgroup, and have some thoughts to contribute.

But first I must admit that I am still a newbie to OpenStack, and still am missing some important clues.  One thing that mystifies me is this: I see essentially the same thing, which I have generally taken to calling holistic scheduling, discussed in two mostly separate contexts: (1) the (nova) scheduler context, and (2) the ambitions for heat.  What am I missing?

I have read the Unified Resource Placement Module document (at https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#) and NovaSchedulerPerspective document (at https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu).  My group already has running code along these lines, and thoughts for future improvements, so I'll mention some salient characteristics.  I have read the etherpad at https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions - and I hope my remarks will help fit these topics together.

Our current code uses one long-lived process to make placement decisions.  The information it needs to do this job is pro-actively maintained in its memory.  We are planning to try replacing this one process with a set of equivalent processes, not sure how well it will work out (we are a research group).

We make a distinction between desired state, target state, and observed state.  The desired state comes in through REST requests, each giving a full virtual resource topology (VRT).  A VRT includes constraints that affect placement, but does not include actual placement decisions.  Those are made by what we call the placement agent.  Yes, it is separate from orchestration (even in the first architecture figure in the u-rpm document the orchestration is separate --- the enclosing box does not abate the essential separateness).  In our architecture, orchestration is downstream from placement (as in u-rpm).  The placement agent produces target state, which is essentially desired state augmented by placement decisions.  Observed state is what comes from the lower layers (Software Defined Compute, Storage, and Network).  We mainly use OpenStack APIs for the lower layers, and have added a few local extensions to make the whole story work.

The placement agent judges available capacity by subtracting current allocations from raw capacity.  The placement agent maintains in its memory a derived thing we call effective state; the allocations in effective state are the union of the allocations in target state and the allocations in observed state.  Since the orchestration is downstream, some of the planned allocations are not in observed state yet.  Since other actors can use the underlying cloud, and other weird sh*t happens, not all the allocations are in target state.  That's why placement is done against the union of the allocations.  This is somewhat conservative, but the alternatives are worse.

Note that placement is concerned with allocations rather than current usage.  Current usage fluctuates much faster than you would want placement to.  Placement needs to be done with a long-term perspective.  Of course, that perspective can be informed by usage information (as well as other sources) --- but it remains a distinct thing.

We consider all our copies of observed state to be soft --- they can be lost and reconstructed at any time, because the true source is the underlying cloud.  Which is not to say that reconstructing a copy is cheap.  We prefer making incremental updates as needed, rather than re-reading the whole thing.  One of our local extensions adds a mechanism by which a client can register to be notified of changes in the Software Defined Compute area.

The target state, on the other hand, is stored authoritatively by the placement agent in a database.

We pose placement as a constrained optimization problem, with a non-linear objective.  We approximate its solution with a very generic algorithm; it is easy to add new kinds of constraints and new contributions to the objective.

The core placement problem is about packing virtual resources into physical containers (e.g., VMs into hosts, volumes into Cinder backends).  A virtual resource has a demand vector, and a corresponding container has a capacity vector of the same length.  For a given container, the sum of the demand vectors of the virtual resources in that container can not exceed the container's capacity vector in any dimension.  We can add dimensions as needed to handle the relevant host/guest characteristics.

We are just now working an example where a Cinder volume can be required to be the only one hosted on whatever Cinder backend hosts it.  This is exactly analogous to requiring that a VM (bare metal or otherwise) be the only one hosted by whatever PM hosts it.

We favor a fairly expressive language for stating desired policies and relationships in VRTs.  We think this is necessary when you move beyond simple examples to more realistic ones.  We do not favor chopping the cloud up into little pieces due to inexpressiveness in the VRT language.

Regards,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130915/fa83d156/attachment.html>


More information about the OpenStack-dev mailing list