[Openstack] Folsom nova-scheduler race condition?

Day, Phil philip.day at hp.com
Tue Oct 9 16:51:34 UTC 2012


Hi Jon,

I believe the retry is meant to occur not just if the spawn fails, but also if a host receives a request which it can't honour because it already has too many VMs running or in progress of being launched.   

Maybe try reducing your filters down a bit ("standard_filters" means all filters I think) in case there is some odd interaction between that full set ?

Phil


-----Original Message-----
From: openstack-bounces+philip.day=hp.com at lists.launchpad.net [mailto:openstack-bounces+philip.day=hp.com at lists.launchpad.net] On Behalf Of Jonathan Proulx
Sent: 09 October 2012 15:53
To: openstack at lists.launchpad.net
Subject: [Openstack] Folsom nova-scheduler race condition?

Hi All,

Looking for a sanity test before I file a bug.  I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My scheduler settings in nova.conf are:

scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter
least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
compute_fill_first_cost_fn_weight=1.0
cpu_allocation_ratio=1.0

This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex.  With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour.

I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that "fix" is based on retrying failures.  Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance.

Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported?

Thanks,
-Jon

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack at lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp




More information about the Openstack mailing list