[openstack-dev] [Nova] A multi-cell instance-list performance test

Zhenyu Zheng zhengzhenyulixi at gmail.com
Fri Aug 17 09:12:44 UTC 2018


Hi

We have tried out the patch:
https://review.openstack.org/#/c/592698/
we also applied https://review.openstack.org/#/c/592285/

it turns out that we are able to half the overall time consumption, we did
try with different sort key and dirs, the results are similar, we didn't
try out paging yet:

[image: image.png]

BR,

Kevin Zheng

On Fri, Aug 17, 2018 at 10:55 AM Zhenyu Zheng <zhengzhenyulixi at gmail.com>
wrote:

> Hi,
>
> Thanks alot for the reply, for your question #2, we did tests with two
> kinds of deployments: 1. There is only 1 DB with all 10 cells(also cell0)
> and it is on the same server with
> the API; 2. We took 5 of the DBs to another machine on the same rack to
> test out if it matters, and it turns out there are no big differences.
>
> For question #3, we did a test with limit = 1000 and 10 cells:
> as we can see, the CPU workload from API process and MySQL query is both
> high in the first 3 seconds, but start from the 4th second, only API
> process occupies the CPU,
> and the memory consumption is low comparing to the CPU consumption. And
> this is tested with the patch fix posted in previous mail.
>
> [image: image.png]
>
> [image: image.png]
>
> BR,
>
> Kevin
>
> On Fri, Aug 17, 2018 at 2:45 AM Dan Smith <dms at danplanet.com> wrote:
>
>> >  yes, the DB query was in serial, after some investigation, it seems
>> that we are unable to perform eventlet.mockey_patch in uWSGI mode, so
>> >  Yikun made this fix:
>> >
>> >  https://review.openstack.org/#/c/592285/
>>
>> Cool, good catch :)
>>
>> >
>> >  After making this change, we test again, and we got this kind of data:
>> >
>> >   total collect sort view
>> >  before monkey_patch 13.5745 11.7012 1.1511 0.5966
>> >  after monkey_patch 12.8367 10.5471 1.5642 0.6041
>> >
>> >  The performance improved a little, and from the log we can saw:
>>
>> Since these all took ~1s when done in series, but now take ~10s in
>> parallel, I think you must be hitting some performance bottleneck in
>> either case, which is why the overall time barely changes. Some ideas:
>>
>> 1. In the real world, I think you really need to have 10x database
>>    servers or at least a DB server with plenty of cores loading from a
>>    very fast (or separate) disk in order to really ensure you're getting
>>    full parallelism of the DB work. However, because these queries all
>>    took ~1s in your serialized case, I expect this is not your problem.
>>
>> 2. What does the network look like between the api machine and the DB?
>>
>> 3. What do the memory and CPU usage of the api process look like while
>>    this is happening?
>>
>> Related to #3, even though we issue the requests to the DB in parallel,
>> we still process the result of those calls in series in a single python
>> thread on the API. That means all the work of reading the data from the
>> socket, constructing the SQLA objects, turning those into nova objects,
>> etc, all happens serially. It could be that the DB query is really a
>> small part of the overall time and our serialized python handling of the
>> result is the slow part. If you see the api process pegging a single
>> core at 100% for ten seconds, I think that's likely what is happening.
>>
>> >  so, now the queries are in parallel, but the whole thing still seems
>> >  serial.
>>
>> In your table, you show the time for "1 cell, 1000 instances" as ~3s and
>> "10 cells, 1000 instances" as 10s. The problem with comparing those
>> directly is that in the latter, you're actually pulling 10,000 records
>> over the network, into memory, processing them, and then just returning
>> the first 1000 from the sort. A closer comparison would be the "10
>> cells, 100 instances" with "1 cell, 1000 instances". In both of those
>> cases, you pull 1000 instances total from the db, into memory, and
>> return 1000 from the sort. In that case, the multi-cell situation is
>> faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000
>> instances" case to "1 cell, 10,000 instances" just to confirm at the
>> larger scale that it's better or at least the same.
>>
>> We _have_ to pull $limit instances from each cell, in case (according to
>> the sort key) the first $limit instances are all in one cell. We _could_
>> try to batch the results from each cell to avoid loading so many that we
>> don't need, but we punted this as an optimization to be done later. I'm
>> not sure it's really worth the complexity at this point, but it's
>> something we could investigate.
>>
>> --Dan
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180817/2472bca1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 30600 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180817/2472bca1/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 28172 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180817/2472bca1/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 194499 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180817/2472bca1/attachment-0005.png>


More information about the OpenStack-dev mailing list