<div dir="ltr">We have the same thing and found that the keystone tokens table had hundreds of thousands of expired tokens in it so the SELECT that gets done during the auth phase of API operations was taking ages to return. Wrote a script to clean up expired tokens and it hasn't recurred. A quick and dirty version to clean it up by hand would be 'delete from token where expires < NOW();' but you might want something a little safer in an automated script. </div>
<div class="gmail_extra"><br clear="all"><div><div dir="ltr">------------------<br>Aubrey Wells<br>Director | Network Services<br>VocalCloud<br>888.305.3850<br><a href="mailto:support@vocalcloud.com" target="_blank">support@vocalcloud.com</a><br>
<a href="http://www.vocalcloud.com" target="_blank">www.vocalcloud.com</a></div></div>
<br><br><div class="gmail_quote">On Thu, Aug 15, 2013 at 10:45 AM, Jonathan Proulx <span dir="ltr"><<a href="mailto:jon@jonproulx.com" target="_blank">jon@jonproulx.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div><div><div><div>Hi All,<br><br></div>I have a single controller node 60 compute node cloud on Ubuntu 12.04 / cloud archive and after upgrade to grizzly everything seem painfully slow.<br><br></div>
I've had 'nova list' take on the order of one minute to return (there's 65 non-deleted instances and a total of just under 500k instances in the instances table but that was true before upgrade as well)<br>
<br></div>The controller node is 4x busier with this tiny load of a single user and a few VMs as it has averaged in production with 1,500 VMs dozens of users and VMs starting every 6sec on average. <br><br>This has me a little worried but the system is so over spec'ed that I can't see it as my current problem as the previous average was 5% CPU utilization so now I'm only at 20%. All the databases fit comfortably in memory with plenty of room for caching so my disk I/0 is virtually nothing.<br>
<br></div>Not quite sure where to start. I'd like to blame conductor for serializing database access, but I really hope any service could handle at least one rack of servers before you needed to scale out...but besides the poor user experience of sluggish response I'm also getting timeouts if I try and start some number of 10's of servers, the usual work flow around here often involves 100's.<br>
<br></div><div>Anyone had similar problems and/or have suggestions of where else to look for bottle necks.<br></div><div><br></div>-Jon<br></div>
<br>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></blockquote></div><br></div>