[openstack-dev] [nova] Better tests for nova scheduler(esp. race conditions)?

Nikola Đipanov ndipanov at redhat.com
Mon Dec 14 15:11:20 UTC 2015


On 12/14/2015 08:20 AM, Cheng, Yingxin wrote:
> Hi All,
> 
>  
> 
> When I was looking at bugs related to race conditions of scheduler
> [1-3], it feels like nova scheduler lacks sanity checks of schedule
> decisions according to different situations. We cannot even make sure
> that some fixes successfully mitigate race conditions to an acceptable
> scale. For example, there is no easy way to test whether server-group
> race conditions still exists after a fix for bug[1], or to make sure
> that after scheduling there will be no violations of allocation ratios
> reported by bug[2], or to test that the retry rate is acceptable in
> various corner cases proposed by bug[3]. And there will be much more in
> this list.
> 
>  
> 
> So I'm asking whether there is a plan to add those tests in the future,
> or is there a design exist to simplify writing and executing those kinds
> of tests? I'm thinking of using fake databases and fake interfaces to
> isolate the entire scheduler service, so that we can easily build up a
> disposable environment with all kinds of fake resources and fake compute
> nodes to test scheduler behaviors. It is even a good way to test whether
> scheduler is capable to scale to 10k nodes without setting up 10k real
> compute nodes.
>

This would be a useful effort - however do not assume that this is going
to be an easy task. Even in the paragraph above, you fail to take into
account that in order to test the scheduling you also need to run all
compute services since claims work like a kind of 2 phase commit where a
scheduling decision gets checked on the destination compute host
(through Claims logic), which involves locking in each compute process.

>  
> 
> I'm also interested in the bp[4] to reduce scheduler race conditions in
> green-thread level. I think it is a good start point in solving the huge
> racing problem of nova scheduler, and I really wish I could help on that.
> 

I proposed said blueprint but am very unlikely to have any time to work
on it this cycle, so feel free to take a stab at it. I'd be more than
happy to prioritize any reviews related to the above BP.

Thanks for your interest in this

N.

>  
> 
>  
> 
> [1] https://bugs.launchpad.net/nova/+bug/1423648
> 
> [2] https://bugs.launchpad.net/nova/+bug/1370207
> 
> [3] https://bugs.launchpad.net/nova/+bug/1341420
> 
> [4] https://blueprints.launchpad.net/nova/+spec/host-state-level-locking
> 
>  
> 
>  
> 
> Regards,
> 
> -Yingxin
> 
>  
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list