[openstack-dev] [Nova] A multi-cell instance-list performance test

Dan Smith dms at danplanet.com
Thu Aug 16 18:44:44 UTC 2018


>  yes, the DB query was in serial, after some investigation, it seems that we are unable to perform eventlet.mockey_patch in uWSGI mode, so
>  Yikun made this fix:
>
>  https://review.openstack.org/#/c/592285/

Cool, good catch :)

>
>  After making this change, we test again, and we got this kind of data:
>
>    total collect sort view 
>  before monkey_patch 13.5745 11.7012 1.1511 0.5966 
>  after monkey_patch 12.8367 10.5471 1.5642 0.6041 
>
>  The performance improved a little, and from the log we can saw:

Since these all took ~1s when done in series, but now take ~10s in
parallel, I think you must be hitting some performance bottleneck in
either case, which is why the overall time barely changes. Some ideas:

1. In the real world, I think you really need to have 10x database
   servers or at least a DB server with plenty of cores loading from a
   very fast (or separate) disk in order to really ensure you're getting
   full parallelism of the DB work. However, because these queries all
   took ~1s in your serialized case, I expect this is not your problem.

2. What does the network look like between the api machine and the DB?

3. What do the memory and CPU usage of the api process look like while
   this is happening?

Related to #3, even though we issue the requests to the DB in parallel,
we still process the result of those calls in series in a single python
thread on the API. That means all the work of reading the data from the
socket, constructing the SQLA objects, turning those into nova objects,
etc, all happens serially. It could be that the DB query is really a
small part of the overall time and our serialized python handling of the
result is the slow part. If you see the api process pegging a single
core at 100% for ten seconds, I think that's likely what is happening.

>  so, now the queries are in parallel, but the whole thing still seems
>  serial.

In your table, you show the time for "1 cell, 1000 instances" as ~3s and
"10 cells, 1000 instances" as 10s. The problem with comparing those
directly is that in the latter, you're actually pulling 10,000 records
over the network, into memory, processing them, and then just returning
the first 1000 from the sort. A closer comparison would be the "10
cells, 100 instances" with "1 cell, 1000 instances". In both of those
cases, you pull 1000 instances total from the db, into memory, and
return 1000 from the sort. In that case, the multi-cell situation is
faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000
instances" case to "1 cell, 10,000 instances" just to confirm at the
larger scale that it's better or at least the same.

We _have_ to pull $limit instances from each cell, in case (according to
the sort key) the first $limit instances are all in one cell. We _could_
try to batch the results from each cell to avoid loading so many that we
don't need, but we punted this as an optimization to be done later. I'm
not sure it's really worth the complexity at this point, but it's
something we could investigate.

--Dan



More information about the OpenStack-dev mailing list