<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2018-08-17 2:44 GMT+08:00 Dan Smith <span dir="ltr"><<a href="mailto:dms@danplanet.com" target="_blank">dms@danplanet.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> yes, the DB query was in serial, after some investigation, it seems that we are unable to perform eventlet.mockey_patch in uWSGI mode, so<br>
> Yikun made this fix:<br>
><br>
> <a href="https://review.openstack.org/#/c/592285/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/592285/</a><br>
<br>
Cool, good catch :)<br>
<br>
><br>
> After making this change, we test again, and we got this kind of data:<br>
><br>
> total collect sort view <br>
> before monkey_patch 13.5745 11.7012 1.1511 0.5966 <br>
> after monkey_patch 12.8367 10.5471 1.5642 0.6041 <br>
><br>
> The performance improved a little, and from the log we can saw:<br>
<br>
Since these all took ~1s when done in series, but now take ~10s in<br>
parallel, I think you must be hitting some performance bottleneck in<br>
either case, which is why the overall time barely changes. Some ideas:<br>
<br>
1. In the real world, I think you really need to have 10x database<br>
servers or at least a DB server with plenty of cores loading from a<br>
very fast (or separate) disk in order to really ensure you're getting<br>
full parallelism of the DB work. However, because these queries all<br>
took ~1s in your serialized case, I expect this is not your problem.<br>
<br>
2. What does the network look like between the api machine and the DB?<br>
<br>
3. What do the memory and CPU usage of the api process look like while<br>
this is happening?<br>
<br>
Related to #3, even though we issue the requests to the DB in parallel,<br>
we still process the result of those calls in series in a single python<br>
thread on the API. That means all the work of reading the data from the<br>
socket, constructing the SQLA objects, turning those into nova objects,<br>
etc, all happens serially. It could be that the DB query is really a<br>
small part of the overall time and our serialized python handling of the<br>
result is the slow part. If you see the api process pegging a single<br>
core at 100% for ten seconds, I think that's likely what is happening.<br></blockquote><div><br></div><div>I remember I did a test on sqlalchemy, the sqlalchemy object construction is super slow than fetch the data from remote.</div><div>Maybe you can try profile it, to figure out how much time spend on the wire, how much time spend on construct the object.</div><div><a href="http://docs.sqlalchemy.org/en/latest/faq/performance.html">http://docs.sqlalchemy.org/en/latest/faq/performance.html</a><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> so, now the queries are in parallel, but the whole thing still seems<br>
> serial.<br>
<br>
In your table, you show the time for "1 cell, 1000 instances" as ~3s and<br>
"10 cells, 1000 instances" as 10s. The problem with comparing those<br>
directly is that in the latter, you're actually pulling 10,000 records<br>
over the network, into memory, processing them, and then just returning<br>
the first 1000 from the sort. A closer comparison would be the "10<br>
cells, 100 instances" with "1 cell, 1000 instances". In both of those<br>
cases, you pull 1000 instances total from the db, into memory, and<br>
return 1000 from the sort. In that case, the multi-cell situation is<br>
faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000<br>
instances" case to "1 cell, 10,000 instances" just to confirm at the<br>
larger scale that it's better or at least the same.<br>
<br>
We _have_ to pull $limit instances from each cell, in case (according to<br>
the sort key) the first $limit instances are all in one cell. We _could_<br>
try to batch the results from each cell to avoid loading so many that we<br>
don't need, but we punted this as an optimization to be done later. I'm<br>
not sure it's really worth the complexity at this point, but it's<br>
something we could investigate.<br>
<br>
--Dan<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>
</blockquote></div><br></div></div>