[Openstack] Folsom nova-scheduler race condition?

Huang Zhiteng winston.d at gmail.com
Wed Oct 10 08:33:27 UTC 2012


On Wed, Oct 10, 2012 at 3:44 PM, Day, Phil <philip.day at hp.com> wrote:
>
>> Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are
>> queued and scheduler updates resource information after it processes each request.  The only possibility may cause
>>the problem you met that I can think of is there are more than 1 scheduler doing scheduling.
>
> I think the new retry logic is meant to be safe even if there is more than one scheduler, as the requests are effectively serialised when they get to the compute manager, which can then reject any that break its actual resource limits ?
>
Yes, but it seems Jonathan's filter list doesn't include RetryFilter,
so it's possible that he ran into a race condition that RetryFilter
targeted to solve.

> -----Original Message-----
> From: openstack-bounces+philip.day=hp.com at lists.launchpad.net [mailto:openstack-bounces+philip.day=hp.com at lists.launchpad.net] On Behalf Of Huang Zhiteng
> Sent: 10 October 2012 04:28
> To: Jonathan Proulx
> Cc: openstack at lists.launchpad.net
> Subject: Re: [Openstack] Folsom nova-scheduler race condition?
>
> On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx <jon at jonproulx.com> wrote:
>> Hi All,
>>
>> Looking for a sanity test before I file a bug.  I very recently
>> upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My
>> scheduler settings in nova.conf are:
>>
>> scheduler_available_filters=nova.scheduler.filters.standard_filters
>> scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,
>> ComputeFilter
>> least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost
>> _fn
>> compute_fill_first_cost_fn_weight=1.0
>> cpu_allocation_ratio=1.0
>>
>> This had been working to fill systems based on available RAM and to
>> not exceed 1:1 allocation ration of CPU resources with Essex.  With
>> Folsom, if I specify a moderately large number of instances to boot or
>> spin up single instances in a tight shell loop they will all get
>> schedule on the same compute node well in excess of the number of
>> available vCPUs . If I start them one at a time (using --poll in a
>> shell loop so each instance is started before the next launches) then
>> I get the expected allocation behaviour.
>>
> Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request.  The only possibility may cause the problem you met that I can think of is there are more than
>  1 scheduler doing scheduling.
>> I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to
>> attempt to address this issue but as I read it that "fix" is based on
>> retrying failures.  Since KVM is capable of over committing both CPU
>> and Memory I don't seem to get retryable failure, just really bad
>> performance.
>>
>> Am I missing something this this fix or perhaps there's a reported bug
>> I didn't find in my search, or is this really a bug no one has
>> reported?
>>
>> Thanks,
>> -Jon
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack at lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
>
> --
> Regards
> Huang Zhiteng
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



-- 
Regards
Huang Zhiteng




More information about the Openstack mailing list