[openstack-dev] [nova] Better tests for nova scheduler(esp. race conditions)?
yingxin.cheng at intel.com
Mon Dec 14 08:20:14 UTC 2015
When I was looking at bugs related to race conditions of scheduler [1-3], it feels like nova scheduler lacks sanity checks of schedule decisions according to different situations. We cannot even make sure that some fixes successfully mitigate race conditions to an acceptable scale. For example, there is no easy way to test whether server-group race conditions still exists after a fix for bug, or to make sure that after scheduling there will be no violations of allocation ratios reported by bug, or to test that the retry rate is acceptable in various corner cases proposed by bug. And there will be much more in this list.
So I'm asking whether there is a plan to add those tests in the future, or is there a design exist to simplify writing and executing those kinds of tests? I'm thinking of using fake databases and fake interfaces to isolate the entire scheduler service, so that we can easily build up a disposable environment with all kinds of fake resources and fake compute nodes to test scheduler behaviors. It is even a good way to test whether scheduler is capable to scale to 10k nodes without setting up 10k real compute nodes.
I'm also interested in the bp to reduce scheduler race conditions in green-thread level. I think it is a good start point in solving the huge racing problem of nova scheduler, and I really wish I could help on that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev