[Openstack-operators] Very Slow API operations after Grizzly upgrade.

Jay Pipes jaypipes at gmail.com
Fri Aug 16 14:07:05 UTC 2013


We've also noticed slowdowns, and switching from conductor to use DB 
operations (set the use_local=True flag in nova.conf in the [conductor] 
section) seems to have helped speed things up quite a bit.

I'd be curious to hear what performance you get with use_local=True 
compare with one conduction and compared with the 6 conductors running 
in VMs...

Best,
-jay

On 08/15/2013 04:32 PM, Jonathan Proulx wrote:
> That's good to know about, but I don't think it was my issue (yet, so
> thanks for saving me from it as I would have been on vacation when it
> was likely to hit hard...)
>
> I *only* had 58k expired tokens after deleting those I'm still getting
> 30-60 second times for nova list, though the keystone response sped up
> to a more reasonable 4-6sec
>
> I do think conductor was killing me, spun up six tiny instances running
> nova-conductor on the cloud and now nova lists are back down to 6
> seconds which is what they were before, and I can blame on the instances
> table needing a good cleaning.
>
>
>
> On Thu, Aug 15, 2013 at 11:09 AM, Lorin Hochstein
> <lorin at nimbisservices.com <mailto:lorin at nimbisservices.com>> wrote:
>
>     Yes, it's the database token issue:
>
>     https://ask.openstack.org/en/question/1740/keystone-never-delete-expires-token-in-database/
>     https://bugs.launchpad.net/ubuntu/+source/keystone/+bug/1032633
>
>
>     If you don't need PKI tokens, you can configure keystone for uuid
>     tokens with the memcache backend
>     instead: <http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=%2Fcom.ibm.sco.doc_2.2%2Ft_memcached_keystone.html
>     <http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=/com.ibm.sco.doc_2.2/t_memcached_keystone.html>>
>
>     If you want to use the PKI tokens, then you'll need to set up a cron
>     job to clear out the old tokens from the database. There's a
>     "keystone-manage token flush"  command coming in havana so that this
>     won't require raw SQL to do: <https://review.openstack.org/#/c/28133/>
>
>     You can also speed up the query by setting a database index on the
>     "valid" column of the token table. This has been done for havana:
>       <https://review.openstack.org/#/c/30753/>
>
>     Take care,
>
>     Lorin
>     --
>     Lorin Hochstein
>     Lead Architect - Cloud Services
>     Nimbis Services, Inc.
>     www.nimbisservices.com <https://www.nimbisservices.com/>
>
>
>
>
>
>     On Aug 15, 2013, at 10:53 AM, Aubrey Wells <aubrey at vocalcloud.com
>     <mailto:aubrey at vocalcloud.com>> wrote:
>
>>     We have the same thing and found that the keystone tokens table
>>     had hundreds of thousands of expired tokens in it so the SELECT
>>     that gets done during the auth phase of API operations was taking
>>     ages to return. Wrote a script to clean up expired tokens and it
>>     hasn't recurred. A quick and dirty version to clean it up by hand
>>     would be 'delete from token where expires < NOW();' but you might
>>     want something a little safer in an automated script.
>>
>>     ------------------
>>     Aubrey Wells
>>     Director | Network Services
>>     VocalCloud
>>     888.305.3850 <tel:888.305.3850>
>>     support at vocalcloud.com <mailto:support at vocalcloud.com>
>>     www.vocalcloud.com <http://www.vocalcloud.com/>
>>
>>
>>     On Thu, Aug 15, 2013 at 10:45 AM, Jonathan Proulx
>>     <jon at jonproulx.com <mailto:jon at jonproulx.com>> wrote:
>>
>>         Hi All,
>>
>>         I have a single controller node 60 compute node cloud on
>>         Ubuntu 12.04 / cloud archive and after upgrade to grizzly
>>         everything seem painfully slow.
>>
>>         I've had 'nova list' take on the order of one minute to return
>>         (there's 65 non-deleted instances and a total of just under
>>         500k instances in the instances table but that was true before
>>         upgrade as well)
>>
>>         The controller node is 4x busier with this tiny load of a
>>         single user and a few VMs as it has averaged in production
>>         with 1,500 VMs dozens of users and VMs starting every 6sec on
>>         average.
>>
>>         This has me a little worried but the system is so over spec'ed
>>         that I can't see it as my current problem as the previous
>>         average was 5% CPU utilization so now I'm only at 20%.  All
>>         the databases fit comfortably in memory with plenty of room
>>         for caching so my disk I/0 is virtually nothing.
>>
>>         Not quite sure where to start.  I'd like to blame conductor
>>         for serializing database access, but I really hope any service
>>         could handle at least one rack of servers before you needed to
>>         scale out...but besides the poor user experience of sluggish
>>         response I'm also getting timeouts if I try and start some
>>         number of 10's of servers, the usual work flow around here
>>         often involves 100's.
>>
>>         Anyone had similar problems and/or have suggestions of where
>>         else to look for bottle necks.
>>
>>         -Jon
>>
>>         _______________________________________________
>>         OpenStack-operators mailing list
>>         OpenStack-operators at lists.openstack.org
>>         <mailto:OpenStack-operators at lists.openstack.org>
>>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>>     _______________________________________________
>>     OpenStack-operators mailing list
>>     OpenStack-operators at lists.openstack.org
>>     <mailto:OpenStack-operators at lists.openstack.org>
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>




More information about the OpenStack-operators mailing list