[openstack-dev] [Openstack][Nova] Slow Nova API Calls from periodic-tempest-devstack-vm-check-hourly
mordred at inaugust.com
Mon Jan 7 08:18:25 UTC 2013
On 01/07/2013 07:55 AM, Monty Taylor wrote:
> On 01/07/2013 06:46 AM, Mark McLoughlin wrote:
>> On Fri, 2013-01-04 at 11:58 -0800, Joe Gordon wrote:
>>> Nova REST calls take way long, even in small scale devstack/tempest
>>> testing. We should spend resources on fixing this for Grizzly. I would
>>> like to start a dialog on how we can fix this and see if anyone wants to
>>> volunteer help. Below are a few possible ideas of things that will help us
>>> make the REST calls faster.
>>> * Continuously track REST API times of devstack/tempest jobs, to identify
>>> slow API calls and watch for trends. (logstash + statsd?)
>>> * Identify the underlying cause of the slow performance. It looks like
>>> nova.db.api may be partially responsible, but not the only issue? What
>>> else is a bottleneck?
>>> * Enable millisecond level time logging
>>> * Add an option in devstack to enable python profiling? Whats best way to
>>> do this?
>>> * Make more REST API related code asynchronous (via RPC)
>> Summary is we're randomly seeing a multi-second delay in nova-api
>> between the time a HTTP request is received and we send an AMQP message?
>> Suspecting the DB calls is a good first guess. The ability to log slow
>> DB calls would be nice. Perhaps MySQL has something we can use here,
>> assuming we log some sort of client ID on the nova-api side that we can
>> use to correlate with the MySQL logs.
> There is, you'll be happy to know, a MySQL feature called the slow query
> log. You can enable it by adding:
> log_slow_queries = /var/log/mysql/mysql-slow.log
> To the my.cnf. Additionally, you can put in the following:
> long_query_time = 1
> The first is a threshold, in seconds, of how long a query needs to last
> before it's considered slow. The second will cause all queries that did
> not use indexes as part of their run to be logged.
> Getting client ids passed to the db log for correlation will be
> trickier. A lot of the big web properties will do tricks in their db
> layer where they will add embedded comments into the sql passed in that
> include information like client ids. I'm not sure how hard that sort of
> tracing would be to do on our end -but turning on the slow query log
> while doing these sorts of things is probably a great starting place.
>> External processes we run is also another place to look. Again, logging
>> those processes that take longer than a certain threshold to run might
>> be good.
>> I'd also investigate whether it's some other greenthread blocking the
>> API request thread from running. What would be really nice would be some
>> way of logging when a particular greenthread ran for a long time while
>> other greenthreads are pending.
>> It would also be good to see some system level stats and see if it's
>> just that the system is generally CPU or I/O bound while these slow
>> calls are happening. Maybe logging CPU, memory, disk and network usage
>> along with the top few processes every tenth of a second.
> Running dstat in a window while doing the other thing is fairly easy.
> Perhaps we should add a dstat screen to devstack?
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev