[openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

John Garbutt john at johngarbutt.com
Fri Jul 15 09:19:31 UTC 2016


On 15 July 2016 at 09:26, Cheng, Yingxin <yingxin.cheng at intel.com> wrote:
> It is easy to understand that scheduling in nova-scheduler service consists of 2 major phases:
> A. Cache refresh, in code [1].
> B. Filtering and weighing, in code [2].
>
> Couple of previous experiments [3] [4] shows that “cache-refresh” is the major bottleneck of nova scheduler. For example, the 15th page of presentation [3] says the time cost of “cache-refresh” takes 98.5% of time of the entire `_schedule` function [6], when there are 200-1000 nodes and 50+ concurrent requests. The latest experiments [5] in China Mobile’s 1000-node environment also prove the same conclusion, and it’s even 99.7% when there’re 40+ concurrent requests.
>
> Here’re some existing solutions for the “cache-refresh” bottleneck:
> I. Caching scheduler.
> II. Scheduler filters in DB [7].
> III. Eventually consistent scheduler host state [8].
>
> I can discuss their merits and drawbacks in a separate thread, but here I want to show a simplest solution based on my findings during the experiments [5]. I wrapped the expensive function [1] and tried to see the behavior of cache-refresh under pressure. It is very interesting to see a single cache-refresh only costs about 0.3 seconds. And when there’re concurrent cache-refresh operations, this cost can be suddenly increased to 8 seconds. I’ve seen it even reached 60 seconds for one cache-refresh under higher pressure. See the below section for details.

I am curious about what DB driver you are using?
Using PyMySQL should remove at lot of those issues.
This is the driver we use in the gate now, but it didn't used to be the default.

If you use the C based MySQL driver, you will find it locks the whole
process when making a DB call, then eventlet schedules the next DB
call, etc, etc, and then it loops back and allows the python code to
process the first db call, etc. In extreme cases you will find the
code processing the DB query considers some of the hosts to be down
since its so long since the DB call was returned.

Switching the driver should dramatically increase the performance of (II)

> It raises a question in the current implementation: Do we really need a cache-refresh operation [1] for *every* requests? If those concurrent operations are replaced by one database query, the scheduler is still happy with the latest resource view from database. Scheduler is even happier because those expensive cache-refresh operations are minimized and much faster (0.3 seconds). I believe it is the simplest optimization to scheduler performance, which doesn’t make any changes in filter scheduler. Minor improvements inside host manager is enough.

So it depends on the usage patterns in your cloud.

The caching scheduler is one way to avoid the cache-refresh operation
on every request. It has an upper limit on throughput as you are
forced into having a single active nova-scheduler process.

But the caching means you can only have a single nova-scheduler
process, where as (II) allows you to have multiple nova-scheduler
workers to increase the concurrency.

> [1] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
> [2] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
> [3] https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
> [4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html
> [5] Please refer to Barcelona summit session ID 15334 later: “A tool to test and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.”
> [6] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53
> [7] https://review.openstack.org/#/c/300178/
> [8] https://review.openstack.org/#/c/306844/
>
>
> ****** Here is the discovery from latest experiments [5] ******
> https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
>
> The figure 1 illustrates the concurrent cache-refresh operations in a nova scheduler service. There’re at most 23 requests waiting for the cache-refresh operations at time 43s.
>
> The figure 2 illustrates the time cost of every requests in the same experiment. It shows that the cost is increased with the growth of concurrency. It proves the vicious circle that a request will wait longer for the database when there’re more waiting requests.
>
> The figure 3/4 illustrate a worse case when the cache-refresh operation costs reach 60 seconds because of excessive cache-refresh operations.

Sorry, its not clear to be if this was using I, II, or III? It seems
like its just using the default system?

This looks like the problems I have seen when you don't use PyMySQL
for your DB driver.

Thanks,
John



More information about the OpenStack-dev mailing list