[openstack-dev] [Openstack][Nova] Slow Nova API Calls from periodic-tempest-devstack-vm-check-hourly
jogo at cloudscaling.com
Thu Jan 10 02:56:55 UTC 2013
I have started to go through everyone's suggestions, but I thought I would
share some more preliminary findings.
Now that millisecond logging is enabled, the logs show a clearer picture of
what is going on.
Here is stack trace from devstack for the command DisassociateAddress, (the
slowest REST call triggered by the openstack exercises):
http://paste.openstack.org/show/29103/ (from local build)
http://paste.openstack.org/show/29104/ (from logs.openstack.org)
Note: neither pastebin have entire RPC message contents in them.
Total time: 4.8428030 seconds
* the 3 RPC calls take: 65ms, 50ms and 107ms
* RPC msg 79bbafe8079841e694640a7435f5d52f takes 1.174 seconds
(7.080-5.906) to perform network_api.get_instance_id_by_floating_address,
which consists of 3 just 3 DB calls.
* RPC msg 4c9055aa7a034d6f85e99df7dc0a8fcc takes 3.109 seconds
(10.191-7.082) to perform network_api.disassociate_floating_ip
* RPC msg 45841e95f05d4003a00bc059fade10b0 takes 0.3359 (10.527-10.191) to
* in RPC msg 4c9055aa7a034d6f85e99df7dc0a8fcc, n-net waits 0.604 seconds
(8.410-7.806) Attempting to grab file lock "iptables", but n-net log shows
nothing else is running (bad lock, or lack of logging?) [Note: I added in
logging when release a file lock, and nothing is releases the file lock
between attempting and got.]
* iptables commands take 2.52 seconds (10.93-8.41)
* Why is network_api.get_instance_id_by_floating_address so slow? Why do we
even need it, if its just doing DB calls?
* Why are all three calls synchronous, can we do the iptables
* How much time is spent in nova.db.api vs what sql itself reports (aka
Patches for better logging/debugging:
* https://review.openstack.org/#/c/19354/ - properly format milliseconds
* https://review.openstack.org/#/c/19076/ - enable mysql slow log in
* https://review.openstack.org/#/c/19358/ - log when release file lock
On Mon, Jan 7, 2013 at 12:18 AM, Monty Taylor <mordred at inaugust.com> wrote:
> On 01/07/2013 07:55 AM, Monty Taylor wrote:
> > On 01/07/2013 06:46 AM, Mark McLoughlin wrote:
> >> On Fri, 2013-01-04 at 11:58 -0800, Joe Gordon wrote:
> >>> Conclusion
> >>> =========
> >>> Nova REST calls take way long, even in small scale devstack/tempest
> >>> testing. We should spend resources on fixing this for Grizzly. I
> >>> like to start a dialog on how we can fix this and see if anyone wants
> >>> volunteer help. Below are a few possible ideas of things that will
> help us
> >>> make the REST calls faster.
> >>> * Continuously track REST API times of devstack/tempest jobs, to
> >>> slow API calls and watch for trends. (logstash + statsd?)
> >>> * Identify the underlying cause of the slow performance. It looks like
> >>> nova.db.api may be partially responsible, but not the only issue? What
> >>> else is a bottleneck?
> >>> * Enable millisecond level time logging
> >>> https://review.openstack.org/#/c/18933/
> >>> * Add an option in devstack to enable python profiling? Whats best
> way to
> >>> do this?
> >>> * Make more REST API related code asynchronous (via RPC)
> >> Summary is we're randomly seeing a multi-second delay in nova-api
> >> between the time a HTTP request is received and we send an AMQP message?
> >> Suspecting the DB calls is a good first guess. The ability to log slow
> >> DB calls would be nice. Perhaps MySQL has something we can use here,
> >> assuming we log some sort of client ID on the nova-api side that we can
> >> use to correlate with the MySQL logs.
> > There is, you'll be happy to know, a MySQL feature called the slow query
> > log. You can enable it by adding:
> > log_slow_queries = /var/log/mysql/mysql-slow.log
> > To the my.cnf. Additionally, you can put in the following:
> > long_query_time = 1
> > log-queries-not-using-indexes
> > The first is a threshold, in seconds, of how long a query needs to last
> > before it's considered slow. The second will cause all queries that did
> > not use indexes as part of their run to be logged.
> > Getting client ids passed to the db log for correlation will be
> > trickier. A lot of the big web properties will do tricks in their db
> > layer where they will add embedded comments into the sql passed in that
> > include information like client ids. I'm not sure how hard that sort of
> > tracing would be to do on our end -but turning on the slow query log
> > while doing these sorts of things is probably a great starting place.
> >> External processes we run is also another place to look. Again, logging
> >> those processes that take longer than a certain threshold to run might
> >> be good.
> >> I'd also investigate whether it's some other greenthread blocking the
> >> API request thread from running. What would be really nice would be some
> >> way of logging when a particular greenthread ran for a long time while
> >> other greenthreads are pending.
> >> It would also be good to see some system level stats and see if it's
> >> just that the system is generally CPU or I/O bound while these slow
> >> calls are happening. Maybe logging CPU, memory, disk and network usage
> >> along with the top few processes every tenth of a second.
> > Running dstat in a window while doing the other thing is fairly easy.
> > Perhaps we should add a dstat screen to devstack?
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev