[openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

Cheng, Yingxin yingxin.cheng at intel.com
Fri Jul 15 15:22:51 UTC 2016


Hi John,

Thanks for the reply.

There’re two rounds of experiments:
Experiment A [3] is deployed by devstack. There’re 1000 compute services with fake virt driver. The DB driver is the devstack default PyMySQL. And the scheduler driver is the default filter scheduler.
Experiment B [4] is the real production environment from China Mobile with about 600 active compute nodes. The DB driver is the default driver of SQLAlchemy, i.e. the C based python-mysql. The scheduler is also filter scheduler.

And in analysis https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing Figure 1/2 are from experiment B, figure 3/4 are from experiment A. So the 2 kinds of DB APIs are all covered.

My point is simple: When the host manager is querying host states for request A, and another request B comes, the host manager won’t launch a second cache-refresh; Instead, it simply reuses the first one and returns the same result to both A and B. In this way, we can reduce the expensive cache-refresh queries to minimum while keeping scheduler host states fresh. It will become more effective when there’re more compute nodes and heavier request pressure.

I also have runnable code that can better explain my idea: https://github.com/cyx1231st/making-food 

-- 
Regards
Yingxin

On 7/15/16, 17:19, "John Garbutt" <john at johngarbutt.com> wrote:

    On 15 July 2016 at 09:26, Cheng, Yingxin <yingxin.cheng at intel.com> wrote:
    > It is easy to understand that scheduling in nova-scheduler service consists of 2 major phases:
    > A. Cache refresh, in code [1].
    > B. Filtering and weighing, in code [2].
    >
    > Couple of previous experiments [3] [4] shows that “cache-refresh” is the major bottleneck of nova scheduler. For example, the 15th page of presentation [3] says the time cost of “cache-refresh” takes 98.5% of time of the entire `_schedule` function [6], when there are 200-1000 nodes and 50+ concurrent requests. The latest experiments [5] in China Mobile’s 1000-node environment also prove the same conclusion, and it’s even 99.7% when there’re 40+ concurrent requests.
    >
    > Here’re some existing solutions for the “cache-refresh” bottleneck:
    > I. Caching scheduler.
    > II. Scheduler filters in DB [7].
    > III. Eventually consistent scheduler host state [8].
    >
    > I can discuss their merits and drawbacks in a separate thread, but here I want to show a simplest solution based on my findings during the experiments [5]. I wrapped the expensive function [1] and tried to see the behavior of cache-refresh under pressure. It is very interesting to see a single cache-refresh only costs about 0.3 seconds. And when there’re concurrent cache-refresh operations, this cost can be suddenly increased to 8 seconds. I’ve seen it even reached 60 seconds for one cache-refresh under higher pressure. See the below section for details.
    
    I am curious about what DB driver you are using?
    Using PyMySQL should remove at lot of those issues.
    This is the driver we use in the gate now, but it didn't used to be the default.
    
    If you use the C based MySQL driver, you will find it locks the whole
    process when making a DB call, then eventlet schedules the next DB
    call, etc, etc, and then it loops back and allows the python code to
    process the first db call, etc. In extreme cases you will find the
    code processing the DB query considers some of the hosts to be down
    since its so long since the DB call was returned.
    
    Switching the driver should dramatically increase the performance of (II)
    
    > It raises a question in the current implementation: Do we really need a cache-refresh operation [1] for *every* requests? If those concurrent operations are replaced by one database query, the scheduler is still happy with the latest resource view from database. Scheduler is even happier because those expensive cache-refresh operations are minimized and much faster (0.3 seconds). I believe it is the simplest optimization to scheduler performance, which doesn’t make any changes in filter scheduler. Minor improvements inside host manager is enough.
    
    So it depends on the usage patterns in your cloud.
    
    The caching scheduler is one way to avoid the cache-refresh operation
    on every request. It has an upper limit on throughput as you are
    forced into having a single active nova-scheduler process.
    
    But the caching means you can only have a single nova-scheduler
    process, where as (II) allows you to have multiple nova-scheduler
    workers to increase the concurrency.
    
    > [1] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
    > [2] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
    > [3] https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
    > [4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html
    > [5] Please refer to Barcelona summit session ID 15334 later: “A tool to test and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.”
    > [6] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53
    > [7] https://review.openstack.org/#/c/300178/
    > [8] https://review.openstack.org/#/c/306844/
    >
    >
    > ****** Here is the discovery from latest experiments [5] ******
    > https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
    >
    > The figure 1 illustrates the concurrent cache-refresh operations in a nova scheduler service. There’re at most 23 requests waiting for the cache-refresh operations at time 43s.
    >
    > The figure 2 illustrates the time cost of every requests in the same experiment. It shows that the cost is increased with the growth of concurrency. It proves the vicious circle that a request will wait longer for the database when there’re more waiting requests.
    >
    > The figure 3/4 illustrate a worse case when the cache-refresh operation costs reach 60 seconds because of excessive cache-refresh operations.
    
    Sorry, its not clear to be if this was using I, II, or III? It seems
    like its just using the default system?
    
    This looks like the problems I have seen when you don't use PyMySQL
    for your DB driver.
    
    Thanks,
    John



More information about the OpenStack-dev mailing list