[openstack-dev] [Nova] A multi-cell instance-list performance test

Yikun Jiang yikunkero at gmail.com
Thu Aug 16 08:53:09 UTC 2018


Some more information:
*1. How did we record the time when listing?*
you can see all our changes in:
http://paste.openstack.org/show/728162/

Total cost:  L26
Construct view: L43
Data gather per cell cost:   L152
Data gather all cells cost: L174
Merge Sort cost: L198

*2. Why it is not parallel in the first result?*
The root reason of gathering data in first table is not in parallel because
we don’t enable
eventlet.monkey_patch (especially, time flag is not True) under the uswgi
mode.

Then the oslo_db’s thread yield [2] doesn’t work, and all db data gathering
threads are blocked
until they get all data from db[1].

Finally the gathering process looks like is executed in serial, so we fix
it in [2]

but after fix[2], it still has no more improvement as we expected, looks
like every thread is influenced by
each other, so we need your idea. : )

[1]
https://github.com/openstack/oslo.db/blob/256ebc3/oslo_db/sqlalchemy/engines.py#L51
[2] https://review.openstack.org/#/c/592285/

Regards,
Yikun
----------------------------------------
Jiang Yikun(Kero)
Mail: yikunkero at gmail.com


Zhenyu Zheng <zhengzhenyulixi at gmail.com> 于2018年8月16日周四 下午3:54写道:

> Hi, Nova
>
> As the Cells v2 architecture is getting mature, and CERN used it and seems
> worked well, *Huawei *is also willing to consider using this in our
> Public Cloud deployments.
> As we still have concerns about the performance when doing multi-cell
> listing, recently *Yikun Jiang* and I have done a performance test for
> ``instance list`` across
> multi-cell deployment, we would like share our test results and findings.
>
> First, I want to point out our testing environment, as we(Yikun and I) are
> doing this as a concept test(to show the ratio between time consumptions
> for query data from
> DB and sorting etc.) so we are doing it on our own machine, the machine
> has 16 CPUs and 80 GB RAM, as it is old, so the Disk might be slow. So we
> will not judging
> the time consumption data itself, but the overall logic and the ratios
> between different steps. We are doing it with a devstack deployment on this
> single machine.
>
> Then I would like to share our test plan, we will setup 10 cells
> (cell1~cell10) and we will generate 10000 instance records in those cells
> (considering 20 instances per
> host, it would be like 500 hosts, which seems a good size for a cell),
> cell0 is kept empty as the number for errored instance could be very less
> and it doesn't really matter.
> We will test the time consumption for listing instances across 1,2,5, and
> 10 cells(cell0 will be always queried, so it is actually 2, 3, 6 and 11
> cells) with the limit of
> 100, 200, 500 and 1000, as the default maximum limit is 1000. In order to
> get more general results, we tested the list with default sort key and dir,
> sort by
> instance_uuid and sort by uuid & name, this should provide a more general
> result.
>
> This is what we got(the time unit is second):
>
> *Default sort*
>
> *Uuid* *Sort*
>
> *uuid+name* *Sort*
>
> *Cell*
>
> *Num*
>
> *Limit*
>
>
> *Total*
>
> *Cost*
>
> *Data Gather Cost*
>
> *Merge Sort Cost*
>
> *Construct View*
>
> *Total*
>
> *Cost*
>
> *Data Gather Cost*
>
> *Merge Sort Cost*
>
> *Construct View*
>
> *Total*
>
> *Cost*
>
> *Data Gather Cost*
>
> *Merge Sort Cost*
>
> *Construct View*
>
> 10
>
> 100
>
> 2.3313
>
> 2.1306
>
> 0.1145
>
> 0.0672
>
> 2.3693
>
> 2.1343
>
> 0.1148
>
> 0.1016
>
> 2.3284
>
> 2.1264
>
> 0.1145
>
> 0.0679
>
> 200
>
> 3.5979
>
> 3.2137
>
> 0.2287
>
> 0.1265
>
> 3.5316
>
> 3.1509
>
> 0.2265
>
> 0.1255
>
> 3.481
>
> 3.054
>
> 0.2697
>
> 0.1284
>
> 500
>
> 7.1952
>
> 6.2597
>
> 0.5704
>
> 0.3029
>
> 7.5057
>
> 6.4761
>
> 0.6263
>
> 0.341
>
> 7.4885
>
> 6.4623
>
> 0.6239
>
> 0.3404
>
> 1000
>
> 13.5745
>
> 11.7012
>
> 1.1511
>
> 0.5966
>
> 13.8408
>
> 11.9007
>
> 1.2268
>
> 0.5939
>
> 13.8813
>
> 11.913
>
> 1.2301
>
> 0.6187
>
> 5
>
> 100
>
> 1.3142
>
> 1.1003
>
> 0.1163
>
> 0.0706
>
> 1.2458
>
> 1.0498
>
> 0.1163
>
> 0.0665
>
> 1.2528
>
> 1.0579
>
> 0.1161
>
> 0.066
>
> 200
>
> 2.0151
>
> 1.6063
>
> 0.2645
>
> 0.1255
>
> 1.9866
>
> 1.5386
>
> 0.2668
>
> 0.1615
>
> 2.0352
>
> 1.6246
>
> 0.2646
>
> 0.1262
>
> 500
>
> 4.2109
>
> 3.1358
>
> 0.7033
>
> 0.3343
>
> 4.1605
>
> 3.0893
>
> 0.6951
>
> 0.3384
>
> 4.1972
>
> 3.2461
>
> 0.6104
>
> 0.3028
>
> 1000
>
> 7.841
>
> 5.8881
>
> 1.2027
>
> 0.6802
>
> 7.7135
>
> 5.9121
>
> 1.1363
>
> 0.5969
>
> 7.8377
>
> 5.9385
>
> 1.1936
>
> 0.6376
>
> 2
>
> 100
>
> 0.6736
>
> 0.4727
>
> 0.1113
>
> 0.0822
>
> 0.605
>
> 0.4192
>
> 0.1105
>
> 0.0656
>
> 0.688
>
> 0.4613
>
> 0.1126
>
> 0.0682
>
> 200
>
> 1.1226
>
> 0.7229
>
> 0.2577
>
> 0.1255
>
> 1.0268
>
> 0.6671
>
> 0.2255
>
> 0.1254
>
> 1.2805
>
> 0.8171
>
> 0.2222
>
> 0.1258
>
> 500
>
> 2.2358
>
> 1.3506
>
> 0.5595
>
> 0.3026
>
> 2.3307
>
> 1.2748
>
> 0.6581
>
> 0.3362
>
> 2.741
>
> 1.6023
>
> 0.633
>
> 0.3365
>
> 1000
>
> 4.2079
>
> 2.3367
>
> 1.2053
>
> 0.5986
>
> 4.2384
>
> 2.4071
>
> 1.2017
>
> 0.633
>
> 4.3437
>
> 2.4136
>
> 1.217
>
> 0.6394
>
> 1
>
> 100
>
> 0.4857
>
> 0.2869
>
> 0.1097
>
> 0.069
>
> 0.4205
>
> 0.233
>
> 0.1131
>
> 0.0672
>
> 0.6372
>
> 0.3305
>
> 0.196
>
> 0.0681
>
> 200
>
> 0.6835
>
> 0.3236
>
> 0.2212
>
> 0.1256
>
> 0.7777
>
> 0.3754
>
> 0.261
>
> 0.13
>
> 0.9245
>
> 0.4527
>
> 0.227
>
> 0.129
>
> 500
>
> 1.5848
>
> 0.6415
>
> 0.6251
>
> 0.3043
>
> 1.6472
>
> 0.6554
>
> 0.6292
>
> 0.3053
>
> 1.9455
>
> 0.8201
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180816/017a43a0/attachment.html>


More information about the OpenStack-dev mailing list