[Openstack-operators] Very Slow API operations after Grizzly upgrade.

Lorin Hochstein lorin at nimbisservices.com
Thu Aug 15 15:09:01 UTC 2013


Yes, it's the database token issue:

https://ask.openstack.org/en/question/1740/keystone-never-delete-expires-token-in-database/
https://bugs.launchpad.net/ubuntu/+source/keystone/+bug/1032633


If you don't need PKI tokens, you can configure keystone for uuid tokens with the memcache backend instead: <http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=%2Fcom.ibm.sco.doc_2.2%2Ft_memcached_keystone.html>

If you want to use the PKI tokens, then you'll need to set up a cron job to clear out the old tokens from the database. There's a "keystone-manage token flush"  command coming in havana so that this won't require raw SQL to do: <https://review.openstack.org/#/c/28133/>

You can also speed up the query by setting a database index on the "valid" column of the token table. This has been done for havana:  <https://review.openstack.org/#/c/30753/>

Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com





On Aug 15, 2013, at 10:53 AM, Aubrey Wells <aubrey at vocalcloud.com> wrote:

> We have the same thing and found that the keystone tokens table had hundreds of thousands of expired tokens in it so the SELECT that gets done during the auth phase of API operations was taking ages to return. Wrote a script to clean up expired tokens and it hasn't recurred. A quick and dirty version to clean it up by hand would be 'delete from token where expires < NOW();' but you might want something a little safer in an automated script. 
> 
> ------------------
> Aubrey Wells
> Director | Network Services
> VocalCloud
> 888.305.3850
> support at vocalcloud.com
> www.vocalcloud.com
> 
> 
> On Thu, Aug 15, 2013 at 10:45 AM, Jonathan Proulx <jon at jonproulx.com> wrote:
> Hi All,
> 
> I have a single controller node 60 compute node cloud on Ubuntu 12.04 / cloud archive and after upgrade to grizzly  everything seem painfully slow.
> 
> I've had 'nova list' take on the order of one minute to return (there's 65 non-deleted instances and a total of just under 500k instances in the instances table but that was true before upgrade as well)
> 
> The controller node is 4x busier with this tiny load of a single user and a few VMs as it has averaged in production with 1,500 VMs dozens of users and VMs starting every 6sec on average.  
> 
> This has me a little worried but the system is so over spec'ed that I can't see it as my current problem as the previous average was 5% CPU utilization so now I'm only at 20%.  All the databases fit comfortably in memory with plenty of room for caching so my disk I/0 is virtually nothing.
> 
> Not quite sure where to start.  I'd like to blame conductor for serializing database access, but I really hope any service could handle at least one rack of servers before you needed to scale out...but besides the poor user experience of sluggish response I'm also getting timeouts if I try and start some number of 10's of servers, the usual work flow around here often involves 100's.
> 
> Anyone had similar problems and/or have suggestions of where else to look for bottle necks.
> 
> -Jon
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130815/8644e119/attachment.html>


More information about the OpenStack-operators mailing list