<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2018-08-17 2:44 GMT+08:00 Dan Smith <span dir="ltr"><<a href="mailto:dms@danplanet.com" target="_blank">dms@danplanet.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">>  yes, the DB query was in serial, after some investigation, it seems that we are unable to perform eventlet.mockey_patch in uWSGI mode, so<br>

>  Yikun made this fix:<br>

><br>

>  <a href="https://review.openstack.org/#/c/592285/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/592285/</a><br>

<br>

Cool, good catch :)<br>

<br>

><br>

>  After making this change, we test again, and we got this kind of data:<br>

><br>

>　   total collect sort view <br>

>  before monkey_patch 13.5745 11.7012 1.1511 0.5966 <br>

>  after monkey_patch 12.8367 10.5471 1.5642 0.6041 <br>

><br>

>  The performance improved a little, and from the log we can saw:<br>

<br>

Since these all took ~1s when done in series, but now take ~10s in<br>

parallel, I think you must be hitting some performance bottleneck in<br>

either case, which is why the overall time barely changes. Some ideas:<br>

<br>

1. In the real world, I think you really need to have 10x database<br>

   servers or at least a DB server with plenty of cores loading from a<br>

   very fast (or separate) disk in order to really ensure you're getting<br>

   full parallelism of the DB work. However, because these queries all<br>

   took ~1s in your serialized case, I expect this is not your problem.<br>

<br>

2. What does the network look like between the api machine and the DB?<br>

<br>

3. What do the memory and CPU usage of the api process look like while<br>

   this is happening?<br>

<br>

Related to #3, even though we issue the requests to the DB in parallel,<br>

we still process the result of those calls in series in a single python<br>

thread on the API. That means all the work of reading the data from the<br>

socket, constructing the SQLA objects, turning those into nova objects,<br>

etc, all happens serially. It could be that the DB query is really a<br>

small part of the overall time and our serialized python handling of the<br>

result is the slow part. If you see the api process pegging a single<br>

core at 100% for ten seconds, I think that's likely what is happening.<br></blockquote><div><br></div><div>I remember I did a test on sqlalchemy, the sqlalchemy object construction is super slow than fetch the data from remote.</div><div>Maybe you can try profile it, to figure out how much time spend on the wire, how much time spend on construct the object.</div><div><a href="http://docs.sqlalchemy.org/en/latest/faq/performance.html">http://docs.sqlalchemy.org/en/latest/faq/performance.html</a><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

>  so, now the queries are in parallel, but the whole thing still seems<br>

>  serial.<br>

<br>

In your table, you show the time for "1 cell, 1000 instances" as ~3s and<br>

"10 cells, 1000 instances" as 10s. The problem with comparing those<br>

directly is that in the latter, you're actually pulling 10,000 records<br>

over the network, into memory, processing them, and then just returning<br>

the first 1000 from the sort. A closer comparison would be the "10<br>

cells, 100 instances" with "1 cell, 1000 instances". In both of those<br>

cases, you pull 1000 instances total from the db, into memory, and<br>

return 1000 from the sort. In that case, the multi-cell situation is<br>

faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000<br>

instances" case to "1 cell, 10,000 instances" just to confirm at the<br>

larger scale that it's better or at least the same.<br>

<br>

We _have_ to pull $limit instances from each cell, in case (according to<br>

the sort key) the first $limit instances are all in one cell. We _could_<br>

try to batch the results from each cell to avoid loading so many that we<br>

don't need, but we punted this as an optimization to be done later. I'm<br>

not sure it's really worth the complexity at this point, but it's<br>

something we could investigate.<br>

<br>

--Dan<br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

</blockquote></div><br></div></div>