[openstack-dev] Compute node stats sent to the scheduler

Chris Behrens cbehrens at codestud.com
Mon Jun 17 20:50:52 UTC 2013


On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbryant at redhat.com> wrote:

> On 06/16/2013 11:25 PM, Dugger, Donald D wrote:
>> Looking into the scheduler a bit there's an issue of duplicated effort that is a little puzzling.  The database table `compute_nodes' is being updated periodically with data about capabilities and resources used (memory, vcpus, ...) while at the same time a periodic RPC call is being made to the scheduler sending pretty much the same data.
>> 
>> Does anyone know why we are updating the same data in two different place using two different mechanisms?  Also, assuming we were to remove one of these updates, which one should go?  (I thought at one point in time there was a goal to create a database free compute node which would imply we should remove the DB update.)
> 
> Have you looked around to see if any code is using the data from the db?
> 
> Having schedulers hit the db for the current state of all compute nodes
> all of the time would be a large additional db burden that I think we
> should avoid.  So, it makes sense to keep the rpc fanout_cast of current
> stats to schedulers.

This is actually what the scheduler uses. :)   The fanout messages are too infrequent and can be too laggy.  So, the scheduler was moved to using the DB a long, long time ago… but it was very inefficient, at first, because it looped through all instances.  So we added things we needed into compute_node and compute_node_stats so we only had to look at the hosts.  You have to pull the hosts anyway, so we pull the stats at the same time.

The problem is… when we stopped using certain data from the fanout messages…. we never removed it.   We should AT LEAST do this.  But.. (see below)..

> 
> The scheduler also does a fanout_cast to all compute nodes when it
> starts up to trigger the compute nodes to populate the cache in the
> scheduler.  It would be nice to never fanout_cast to all compute nodes
> (given that there may be a *lot* of them).  We could replace this with
> having the scheduler populate its cache from the database.

I think we should audit the remaining things that the scheduler uses from these messages and move them to the DB.  I believe it's limited to the hypervisor capabilities to compare against aggregates or some such.  I believe it's things that change very rarely… so an alternative can be to only send fanout messages when capabilities change!   We could always do that as a first step.

> 
> Removing the db usage completely would be nice if nothing is actually
> using it, but we'd have to look into an alternative solution for
> removing the scheduler fanout_cast to compute.

Relying on anything but the DB for current memory free, etc, is just too laggy… so we need to stick with it, IMO.

- Chris




More information about the OpenStack-dev mailing list