The performance of the metadata query with cloud-init has been causing some people problems (it's so slow cloud-init times out!), and has led to the suggestion that we need lots of caching. (My hypothesis is that we don't...) By turning on SQL debugging in SQL Alchemy (for which I've proposed a patch for Essex: https://review.openstack.org/#change,5783), I was able to capture the SQL statements. I'm focusing on the SQL statements for the metadata call. The code does this: 1) Checks the cache to see if it has the data 2) Makes a message-bus call to the network service to get the fixed_ip info from the address 3) Looks up all sort of metadata in the database 4) Formats the reply #1 means that the first call is slower than the others, so we need to focus on the first call. #2 could be problematic, if the message queue is overloaded or if the network service is slow to response #3 could be problematic if the DB isn't working properly #4 is hopefully not the problem. The relevant SQL log from the API server: http://paste.openstack.org/show/12109/ And from the network server: http://paste.openstack.org/show/12116/ I've analyzed each of the SQL statements: API http://paste.openstack.org/show/12110/ (Need to check that there isn't a table scan when instance_info_caches is large) http://paste.openstack.org/show/12111/ Table scan on services table, but this is presumably smallish http://paste.openstack.org/show/12112/ No index. Table scan on s3_images table. Also this table is MyISAM. Also seems to insert rows on the first call (not shown). Evil. http://paste.openstack.org/show/12113/ http://paste.openstack.org/show/12114/ block_device_mapping is MyISAM. Network http://paste.openstack.org/show/12117/ http://paste.openstack.org/show/12118/ (Fetch virtual_interface by instance_id) http://paste.openstack.org/show/12119/ (Fetch network by id) http://paste.openstack.org/show/12120/ Missing index => table scan on networks. Unnecessary row re-fetch. http://paste.openstack.org/show/12121/ Missing index => table scan on virtual_interfaces. Unnecessary row-refetch. http://paste.openstack.org/show/12122/ (Fetch fixed_ips on virtual interface) http://paste.openstack.org/show/12123/ Missing index => table scan on networks. Unnecessary row re-fetch. (Double re-fetch. What does it mean?) http://paste.openstack.org/show/12124/ Missing index => table scan on virtual_interfaces. Another re-re-fetch. http://paste.openstack.org/show/12125/ Missing index => table scan on fixed_ips (Uh-oh - I hope you didn't allocate a /8!!). We do have this row from the virtual interface lookup; perhaps we could remove this query? http://paste.openstack.org/show/12126/ http://paste.openstack.org/show/12127/ We still have a bunch of MyISAM tables (at least with a devstack install): http://paste.openstack.org/show/12115/ As I see it, these are the issues (in sort of priority order): *Critical* Table scan of fixed_ips on the network service (row per IP address?) Use of MyISAM tables, particularly for s3_images and block_device_mapping Table scan of virtual_interfaces (row per instance?) Verify that MySQL isn't doing a table scan on http://paste.openstack.org/show/12110/ when # of instances is large *Naughty* *(Mostly because the tables are small)* Table scan of s3_images Table scan of services Table scan of networks *Low importance* *(Re-fetches aren't a big deal if the queries are fast)* Row re-fetches & re-re-fetches My install in nowhere near big enough for any of these to actually cause a real problem, so I'd love to get timings / a log from someone that is having a problem. Even the table scan of fixed_ips should be OK if you have enough RAM. Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120325/252fc774/attachment.html>