[openstack-dev] [Openstack][Nova] Slow Nova API Calls from periodic-tempest-devstack-vm-check-hourly

Joe Gordon jogo at cloudscaling.com
Fri Jan 4 19:58:33 UTC 2013


Hi All,

tl;dr Nova REST calls take more then 1 second about 5% of the time, with
some calls taking upwards of 2 seconds on average in small scale testing
(and much worse in production).  Lets fix this.



After noticing slow nova REST API performance in devstack, I dug a bit
further.

Log sources
=========
jenkins.openstack.org runs several periodic tempest tests with logs saved
to logs.openstack.org

http://logs.openstack.org/periodic/periodic-tempest-devstack-vm-check-hourly/
http://logs.openstack.org/periodic/periodic-tempest-devstack-vm-full/


Percent REST calls slower than 1 second
 ===============================
periodic-tempest-devstack-vm-check-hourly: 5.9%
periodic-tempest-devstack-vm-full: 4.4%


Average REST call time
==================
periodic-tempest-devstack-vm-check-hourl:y 0.420115 seconds
periodic-tempest-devstack-vm-full: 0.339301 seconds


Fastest call (min, average)
====================
 /2009-04-04/meta-data/*: 0.0005190  0.099357 seconds

Excluding meta-data calls the fastest seen calls take about 0.05 seconds.
 So we see the problem isn't WSGI.


Slow REST Calls by average (from 37 recent runs
of periodic-tempest-devstack-vm-check-hourly)
=======================================================================

          format: [Average-time username REST CALL]
          the username is included to provide more context about what the
rest call is actually doing by making it easy find in the log files.

0.765313 demo GET /v2/UUID/images/UUID2
0.892474 ImagesMetadataTest-user GET /v2/UUID/images/UUID2
0.939789 ListImageFiltersTest-user GET /v2/UUID/images/UUID2
0.946835 AuthorizationTest-user GET /v2/UUID/images/UUID2
1.32859 ListImageFiltersTest-user DELETE /v2/UUID/images/UUID2
1.39626 ListImageFiltersTest-user POST /v2/UUID/servers/UUID2/action
 1.5389 ImagesMetadataTest-user PUT /v2/UUID/images/UUID2/metadata
1.79531 ServerActionsTestXML-user POST /v2/UUID/servers/UUID2/action
1.80866 ServerActionsTestJSON-user POST /v2/UUID/servers/UUID2/action
2.05908 FloatingIPsTestXML-user POST /v2/UUID/servers/UUID2/action
2.11384 FloatingIPsTestJSON-user POST /v2/UUID/servers/UUID2/action
2.20237 demo POST /v2/UUID/servers
2.23645 ServersTestJSON-user POST /v2/UUID/servers
2.23917 ServersTestXML-user POST /v2/UUID/servers
2.26852 ListServersNegativeTest-user POST /v2/UUID/servers
2.35584 VolumesActionsTest-user POST /v2/UUID/servers
2.37725 ServerActionsTestJSON-user POST /v2/UUID/servers
2.39885 ServerActionsTestXML-user POST /v2/UUID/servers
2.9973 TestServerDiskConfig-user POST /v2/UUID/servers/UUID2/action

Further analysis
=============
source:
http://logs.openstack.org/periodic/periodic-tempest-devstack-vm-check-hourly/1222/logs/screen-n-api.txt.gz

Test:  TestServerDiskConfig-user POST /v2/UUID/servers/UUID2/action
action: Rebuilding instance

request IDs:
req-d9c61f7e-78ac-4412-8956-a35189a49fa6
req-6ec84648-5a9e-4a4f-b4e8-b4ebff73f8f7
req-19bac8ca-3cae-4b8f-be68-6d045fdb8aa7
req-f4e2be63-03b0-47e3-8e41-04828227547f
  nova-api  00:53:52 receive REST call
  nova-api  00:53:55 cast to compute
  nova-cpu 00:53:55 receive call
  nova-cpu 00:53:55 respond

Most of the time appears to be spent in nova.compute.api:rebuild (line 1528)


Local test:  FloatingIPsTestXML-user POST /v2/UUID/servers/UUID2/action
=======================================================

With logging of nova.db.api call times added

Total time: 3.117 seconds

Logs:
nova-api 00:31:01 REST call received
nova-api 00:31:01 nova.db.api.instance_get_by_uuid 0.12 sec
nova-api 00:31:01 nova.db.api.instance_info_cache_update 0.01 sec

nova-net 00:31:02 nova.db.api.floating_ip_get_by_address 0.00 sec
nova-net 00:31:02 nova.db.api.fixed_ip_get_by_address 0.08 sec
nova-net 00:31:02 nova.db.api.network_get 0.01 sec
nova-net 00:31:02 nova.db.api.floating_ip_fixed_ip_associate 0.08 sec
nova-net 00:31:04 nova.db.api.instance_get_by_uuid' 0.23 sec
nova-net 00:31:04 nova.db.api.virtual_interface_get_by_instance' 0.23 sec
nova-net  00:31:04 nova.db.api.network_get' 0.01 sec
nova-net  00:31:04 nova.db.api.network_get_by_uuid' 0.02 sec
nova-net  00:31:04 nova.db.api.virtual_interface_get_by_uuid' 0.00 sec
nova-net  00:31:04 nova.db.api.fixed_ips_by_virtual_interface' 0.00 sec
nova-net  00:31:04 nova.db.api.network_get_by_uuid' 0.01 sec
nova-net  00:31:04 nova.db.api.virtual_interface_get_by_uuid' 0.02 sec
nova-net  00:31:04 nova.db.api.floating_ip_get_by_fixed_address' 0.02 sec

total nova.db.api time:  0.84 seconds over 15 calls
nova.db.api takes up 26.9 % of call time

Other time appears to be in rootwrap for iptables.

Conclusion
=========

Nova REST calls take way long, even in small scale devstack/tempest
testing.  We should spend resources on fixing this for Grizzly.  I would
like to start a dialog on how we can fix this and see if anyone wants to
volunteer help.  Below are a few possible ideas of things that will help us
make the REST calls faster.

* Continuously track REST API times of devstack/tempest jobs, to identify
slow API calls and watch for trends.  (logstash + statsd?)
* Identify the underlying cause of the slow performance.  It looks like
nova.db.api may be partially responsible, but not the only issue?  What
else is a bottleneck?
* Enable millisecond level time logging
https://review.openstack.org/#/c/18933/
* Add an option in devstack to enable python profiling?  Whats best way to
do this?
* Make more REST API related code asynchronous (via RPC)


best,
Joe Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130104/6ac72b32/attachment.html>


More information about the OpenStack-dev mailing list