Open Stack

Tue Feb 26 04:15:27 UTC 2013

On 26/02/2013, at 2:15 PM, Chris Behrens <cbehrens at codestud.com> wrote:

> 
> On Feb 25, 2013, at 6:39 PM, Joe Gordon <jogo at cloudscaling.com> wrote:
> 
>> 
>> It looks like the scheduler issues are related to the rabbitmq issues.   "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
>> 
>> What does 'nova host-list' say?   the clocks must all be synced up?
> 
> Good things to check.  It feels like something is spinning way too much within this filter, though.  This can also cause the above message.  The scheduler pulls all of the records before it starts filtering… and if there's a huge delay somewhere, it can start seeing a bunch of hosts as disabled.
> 
> The filter doesn't look like a problem.. unless there's a large amount of aggregate metadata… and/or a large amount of key/values for the instance_type's extra specs.   There *is* a DB call in the filter.  If that's blocking for an extended period of time, the whole process is blocked…  But I suspect by the '100% cpu' comment, that this is not the case…  So the only thing I can think of is that it returns a tremendous amount of metadata.
> 
> Adding some extra logging in the filter could be useful.
> 
> - Chris

Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 hosts has either one or the other. At the moment every flavour has either one or the other too so I don't think it's too much data. 

I've tracked it down to this call:

metadata = db.aggregate_metadata_get_by_host(context, host_state.host)

It's taking forever to complete. Just having a look into that code to see why, there is a nested for loop in there so my guess is something to do with that although there is hardly any data in our aggregates tables so I can't see it taking that long.

Cheers,
Sam

Open Stack

[Openstack] AggregateInstanceExtraSpecs very slow?

OpenStack

Community

Documentation

Branding & Legal