[openstack-dev] Scheduler proposal

Joshua Harlow harlowja at fastmail.com
Fri Oct 9 21:52:46 UTC 2015


Gregory Haynes wrote:
> Excerpts from Joshua Harlow's message of 2015-10-08 15:24:18 +0000:
>> On this point, and just thinking out loud. If we consider saving
>> compute_node information into say a node in said DLM backend (for
>> example a znode in zookeeper[1]); this information would be updated
>> periodically by that compute_node *itself* (it would say contain
>> information about what VMs are running on it, what there utilization is
>> and so-on).
>>
>> For example the following layout could be used:
>>
>> /nova/compute_nodes/<hypervisor-hostname>
>>
>> <hypervisor-hostname>  data could be:
>>
>> {
>>      vms: [],
>>      memory_free: XYZ,
>>      cpu_usage: ABC,
>>      memory_used: MNO,
>>      ...
>> }
>>
>> Now if we imagine each/all schedulers having watches
>> on /nova/compute_nodes/ ([2] consul and etc.d have equivalent concepts
>> afaik) then when a compute_node updates that information a push
>> notification (the watch being triggered) will be sent to the
>> scheduler(s) and the scheduler(s) could then update a local in-memory
>> cache of the data about all the hypervisors that can be selected from
>> for scheduling. This avoids any reading of a large set of data in the
>> first place (besides an initial read-once on startup to read the
>> initial list + setup the watches); in a way its similar to push
>> notifications. Then when scheduling a VM ->  hypervisor there isn't any
>> need to query anything but the local in-memory representation that the
>> scheduler is maintaining (and updating as watches are triggered)...
>>
>> So this is why I was wondering about what capabilities of cassandra are
>> being used here; because the above I think are unique capababilties of
>> DLM like systems (zookeeper, consul, etcd) that could be advantageous
>> here...
>>
>> [1]
>> https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_zkDataModel_znodes
>>
>> [2]
>> https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches
>
> I wonder if we would even need to make something so specialized to get
> this kind of local caching. I dont know what the current ZK tools are
> but the original Chubby paper described that clients always have a
> write-through cache for nodes which they set up subscriptions for in
> order to break the cache.

Perhaps not, make it as simple as we want as long as people agree that 
the concept is useful. My idea is it would look like something like:

(simplified obviously):

http://paste.openstack.org/show/475938/

Then resources (in this example compute_nodes) would register themselves 
via a call like:

 >>> from kazoo import client
 >>> import json
 >>> c = client.KazooClient()
 >>> c.start()
 >>> n = "/node/compute_nodes"
 >>> c.ensure_path(n)
 >>> c.create("%s/h1.hypervisor.yahoo.com" % n, json.dumps({}))

^^^ the dictionary above would be whatever data to then put into the 
receivers caches...

Then in the pasted program (running in a different shell/computer/...) 
the cache would then get updated, and then a user of that cache can use 
it to find resources to schedule things to....

The example should work, just get zookeeper setup:

http://packages.ubuntu.com/precise/zookeeperd should do all of that, and 
then try it out...

>
> Also, re: etcd - The last time I checked their subscription API was
> woefully inadequate for performing this type of thing without hurding
> issues.

Any idea on the consul watch capabilities?

Similar API(s) appear to exist (but I don't know how they work, if they 
do at all); https://www.consul.io/docs/agent/watches.html

>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list