[openstack-dev] Scheduler proposal

Clint Byrum clint at fewbar.com
Thu Oct 8 15:54:41 UTC 2015


Excerpts from Joshua Harlow's message of 2015-10-08 08:38:57 -0700:
> Joshua Harlow wrote:
> > On Thu, 8 Oct 2015 10:43:01 -0400
> > Monty Taylor<mordred at inaugust.com>  wrote:
> >
> >> On 10/08/2015 09:01 AM, Thierry Carrez wrote:
> >>> Maish Saidel-Keesing wrote:
> >>>> Operational overhead has a cost - maintaining 3 different database
> >>>> tools, backing them up, providing HA, etc. has operational cost.
> >>>>
> >>>> This is not to say that this cannot be overseen, but it should be
> >>>> taken into consideration.
> >>>>
> >>>> And *if* they can be consolidated into an agreed solution across
> >>>> the whole of OpenStack - that would be highly beneficial (IMHO).
> >>> Agreed, and that ties into the similar discussion we recently had
> >>> about picking a common DLM. Ideally we'd only add *one* general
> >>> dependency and use it for locks / leader election / syncing status
> >>> around.
> >>>
> >> ++
> >>
> >> All of the proposed DLM tools can fill this space successfully. There
> >> is definitely not a need for multiple.
> >
> > On this point, and just thinking out loud. If we consider saving
> > compute_node information into say a node in said DLM backend (for
> > example a znode in zookeeper[1]); this information would be updated
> > periodically by that compute_node *itself* (it would say contain
> > information about what VMs are running on it, what there utilization is
> > and so-on).
> >
> > For example the following layout could be used:
> >
> > /nova/compute_nodes/<hypervisor-hostname>
> >
> > <hypervisor-hostname>  data could be:
> >
> > {
> >      vms: [],
> >      memory_free: XYZ,
> >      cpu_usage: ABC,
> >      memory_used: MNO,
> >      ...
> > }
> >
> > Now if we imagine each/all schedulers having watches
> > on /nova/compute_nodes/ ([2] consul and etc.d have equivalent concepts
> > afaik) then when a compute_node updates that information a push
> > notification (the watch being triggered) will be sent to the
> > scheduler(s) and the scheduler(s) could then update a local in-memory
> > cache of the data about all the hypervisors that can be selected from
> > for scheduling. This avoids any reading of a large set of data in the
> > first place (besides an initial read-once on startup to read the
> > initial list + setup the watches); in a way its similar to push
> > notifications. Then when scheduling a VM ->  hypervisor there isn't any
> > need to query anything but the local in-memory representation that the
> > scheduler is maintaining (and updating as watches are triggered)...
> >
> > So this is why I was wondering about what capabilities of cassandra are
> > being used here; because the above I think are unique capababilties of
> > DLM like systems (zookeeper, consul, etcd) that could be advantageous
> > here...
> >
> > [1]
> > https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_zkDataModel_znodes
> >
> > [2]
> > https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches
> >
> >
> 
> And here's a final super-awesomeness,
> 
> Use the same existence of that znode + information (perhaps using 
> ephemeral znodes or equivalent) to determine if a hypervisor is 'alive' 
> or 'dead', thus removing the need to do queries and periodic writes to 
> the nova database to determine if a hypervisors nova-compute service is 
> alive or dead (with reads via 
> https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L33 
> and other similar code scattered in nova)...
> 

^^ THIS is the kind of architectural thinking I'd like to see us do more
of.

This isn't "hey I have a better database" it is "I have a way to reduce
the most common operations to O(1) complexity".

Ed, for all of the promise of your experiment, I'd actually rather see
time spent on Josh's idea above. In fact, I might spend time on Josh's
idea above. :)



More information about the OpenStack-dev mailing list