[openstack-dev] [Openstack][Nova] Slow Nova API Calls from periodic-tempest-devstack-vm-check-hourly

Monty Taylor mordred at inaugust.com
Mon Jan 7 07:55:10 UTC 2013

On 01/07/2013 06:46 AM, Mark McLoughlin wrote:
> On Fri, 2013-01-04 at 11:58 -0800, Joe Gordon wrote:
>> Conclusion
>> =========
>> Nova REST calls take way long, even in small scale devstack/tempest
>> testing.  We should spend resources on fixing this for Grizzly.  I would
>> like to start a dialog on how we can fix this and see if anyone wants to
>> volunteer help.  Below are a few possible ideas of things that will help us
>> make the REST calls faster.
>> * Continuously track REST API times of devstack/tempest jobs, to identify
>> slow API calls and watch for trends.  (logstash + statsd?)
>> * Identify the underlying cause of the slow performance.  It looks like
>> nova.db.api may be partially responsible, but not the only issue?  What
>> else is a bottleneck?
>> * Enable millisecond level time logging
>> https://review.openstack.org/#/c/18933/
>> * Add an option in devstack to enable python profiling?  Whats best way to
>> do this?
>> * Make more REST API related code asynchronous (via RPC)
> Summary is we're randomly seeing a multi-second delay in nova-api
> between the time a HTTP request is received and we send an AMQP message?
> Suspecting the DB calls is a good first guess. The ability to log slow
> DB calls would be nice. Perhaps MySQL has something we can use here,
> assuming we log some sort of client ID on the nova-api side that we can
> use to correlate with the MySQL logs.

There is, you'll be happy to know, a MySQL feature called the slow query
log. You can enable it by adding:

log_slow_queries       = /var/log/mysql/mysql-slow.log

To the my.cnf. Additionally, you can put in the following:

long_query_time = 1

The first is a threshold, in seconds, of how long a query needs to last
before it's considered slow. The second will cause all queries that did
not use indexes as part of their run to be logged.

Getting client ids passed to the db log for correlation will be
trickier. A lot of the big web properties will do tricks in their db
layer where they will add embedded comments into the sql passed in that
include information like client ids. I'm not sure how hard that sort of
tracing would be to do on our end -but turning on the slow query log
while doing these sorts of things is probably a great starting place.

> External processes we run is also another place to look. Again, logging
> those processes that take longer than a certain threshold to run might
> be good.
> I'd also investigate whether it's some other greenthread blocking the
> API request thread from running. What would be really nice would be some
> way of logging when a particular greenthread ran for a long time while
> other greenthreads are pending.
> It would also be good to see some system level stats and see if it's
> just that the system is generally CPU or I/O bound while these slow
> calls are happening. Maybe logging CPU, memory, disk and network usage
> along with the top few processes every tenth of a second.

Running dstat in a window while doing the other thing is fairly easy.
Perhaps we should add a dstat screen to devstack?

More information about the OpenStack-dev mailing list