[openstack-dev] Compute node stats sent to the scheduler

Wang, Shane shane.wang at intel.com
Tue Jun 18 12:13:25 UTC 2013


Hi,

I am new in this area. I got an idea but didn't know whether that works.
Fanout_cast is expensive and DB could be a burden. Can we maintain the stat data at nodes, and when and only when a scheduler needs to do any scheduling, the scheduler proactively to ask nodes their stats?
The assumption is scheduling doesn't happen frequently, compared with the frequency of fanout_cast?

Best Regards.
--
Shane

Brian Elliott wrote on 2013-06-18:

> 
> On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehrens at codestud.com> wrote:
> 
>> 
>> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbryant at redhat.com> wrote:
>> 
>>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote:
>>>> Looking into the scheduler a bit there's an issue of duplicated effort that is a
> little puzzling.  The database table `compute_nodes' is being updated
> periodically with data about capabilities and resources used (memory, vcpus, ...)
> while at the same time a periodic RPC call is being made to the scheduler sending
> pretty much the same data.
>>>> 
>>>> Does anyone know why we are updating the same data in two different
> place using two different mechanisms?  Also, assuming we were to remove one
> of these updates, which one should go?  (I thought at one point in time there
> was a goal to create a database free compute node which would imply we should
> remove the DB update.)
>>> 
>>> Have you looked around to see if any code is using the data from the db?
>>> 
>>> Having schedulers hit the db for the current state of all compute nodes
>>> all of the time would be a large additional db burden that I think we
>>> should avoid.  So, it makes sense to keep the rpc fanout_cast of current
>>> stats to schedulers.
>> 
>> This is actually what the scheduler uses. :)   The fanout messages are too
> infrequent and can be too laggy.  So, the scheduler was moved to using the DB
> a long, long time ago. but it was very inefficient, at first, because it looped
> through all instances.  So we added things we needed into compute_node and
> compute_node_stats so we only had to look at the hosts.  You have to pull the
> hosts anyway, so we pull the stats at the same time.
>> 
>> The problem is. when we stopped using certain data from the fanout
> messages.. we never removed it.   We should AT LEAST do this.  But.. (see
> below)..
>> 
>>> 
>>> The scheduler also does a fanout_cast to all compute nodes when it
>>> starts up to trigger the compute nodes to populate the cache in the
>>> scheduler.  It would be nice to never fanout_cast to all compute nodes
>>> (given that there may be a *lot* of them).  We could replace this with
>>> having the scheduler populate its cache from the database.
>> 
>> I think we should audit the remaining things that the scheduler uses from these
> messages and move them to the DB.  I believe it's limited to the hypervisor
> capabilities to compare against aggregates or some such.  I believe it's things
> that change very rarely. so an alternative can be to only send fanout messages
> when capabilities change!   We could always do that as a first step.
>> 
>>> 
>>> Removing the db usage completely would be nice if nothing is actually
>>> using it, but we'd have to look into an alternative solution for
>>> removing the scheduler fanout_cast to compute.
>> 
>> Relying on anything but the DB for current memory free, etc, is just
>> too laggy. so we need to stick with it, IMO.
>> 
>> - Chris
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> As Chris said, the reason it ended up this way using the DB is to quickly get up to
> date usage on hosts to the scheduler.  I certainly understand the point that it's a
> whole lot of increased load on the DB, but the RPC data was quite stale.  If there
> is interest in moving away from the DB updates, I think we have to either:
> 
> 1) Send RPC updates to scheduler  on essentially every state change
> during a build.
> 
> or
> 
> 2) Change the scheduler architecture so there is some "memory" of
> resources consumed between requests.  The scheduler would have to
> remember which hosts recent builds were assigned to.  This could be a
> bit of a data synchronization problem. if you're talking about using
> multiple scheduler instances.
> 
> Brian
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list