[Openstack-operators] Very Slow API operations after Grizzly upgrade.
Jay Pipes
jaypipes at gmail.com
Fri Aug 16 14:07:05 UTC 2013
We've also noticed slowdowns, and switching from conductor to use DB
operations (set the use_local=True flag in nova.conf in the [conductor]
section) seems to have helped speed things up quite a bit.
I'd be curious to hear what performance you get with use_local=True
compare with one conduction and compared with the 6 conductors running
in VMs...
Best,
-jay
On 08/15/2013 04:32 PM, Jonathan Proulx wrote:
> That's good to know about, but I don't think it was my issue (yet, so
> thanks for saving me from it as I would have been on vacation when it
> was likely to hit hard...)
>
> I *only* had 58k expired tokens after deleting those I'm still getting
> 30-60 second times for nova list, though the keystone response sped up
> to a more reasonable 4-6sec
>
> I do think conductor was killing me, spun up six tiny instances running
> nova-conductor on the cloud and now nova lists are back down to 6
> seconds which is what they were before, and I can blame on the instances
> table needing a good cleaning.
>
>
>
> On Thu, Aug 15, 2013 at 11:09 AM, Lorin Hochstein
> <lorin at nimbisservices.com <mailto:lorin at nimbisservices.com>> wrote:
>
> Yes, it's the database token issue:
>
> https://ask.openstack.org/en/question/1740/keystone-never-delete-expires-token-in-database/
> https://bugs.launchpad.net/ubuntu/+source/keystone/+bug/1032633
>
>
> If you don't need PKI tokens, you can configure keystone for uuid
> tokens with the memcache backend
> instead: <http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=%2Fcom.ibm.sco.doc_2.2%2Ft_memcached_keystone.html
> <http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=/com.ibm.sco.doc_2.2/t_memcached_keystone.html>>
>
> If you want to use the PKI tokens, then you'll need to set up a cron
> job to clear out the old tokens from the database. There's a
> "keystone-manage token flush" command coming in havana so that this
> won't require raw SQL to do: <https://review.openstack.org/#/c/28133/>
>
> You can also speed up the query by setting a database index on the
> "valid" column of the token table. This has been done for havana:
> <https://review.openstack.org/#/c/30753/>
>
> Take care,
>
> Lorin
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com <https://www.nimbisservices.com/>
>
>
>
>
>
> On Aug 15, 2013, at 10:53 AM, Aubrey Wells <aubrey at vocalcloud.com
> <mailto:aubrey at vocalcloud.com>> wrote:
>
>> We have the same thing and found that the keystone tokens table
>> had hundreds of thousands of expired tokens in it so the SELECT
>> that gets done during the auth phase of API operations was taking
>> ages to return. Wrote a script to clean up expired tokens and it
>> hasn't recurred. A quick and dirty version to clean it up by hand
>> would be 'delete from token where expires < NOW();' but you might
>> want something a little safer in an automated script.
>>
>> ------------------
>> Aubrey Wells
>> Director | Network Services
>> VocalCloud
>> 888.305.3850 <tel:888.305.3850>
>> support at vocalcloud.com <mailto:support at vocalcloud.com>
>> www.vocalcloud.com <http://www.vocalcloud.com/>
>>
>>
>> On Thu, Aug 15, 2013 at 10:45 AM, Jonathan Proulx
>> <jon at jonproulx.com <mailto:jon at jonproulx.com>> wrote:
>>
>> Hi All,
>>
>> I have a single controller node 60 compute node cloud on
>> Ubuntu 12.04 / cloud archive and after upgrade to grizzly
>> everything seem painfully slow.
>>
>> I've had 'nova list' take on the order of one minute to return
>> (there's 65 non-deleted instances and a total of just under
>> 500k instances in the instances table but that was true before
>> upgrade as well)
>>
>> The controller node is 4x busier with this tiny load of a
>> single user and a few VMs as it has averaged in production
>> with 1,500 VMs dozens of users and VMs starting every 6sec on
>> average.
>>
>> This has me a little worried but the system is so over spec'ed
>> that I can't see it as my current problem as the previous
>> average was 5% CPU utilization so now I'm only at 20%. All
>> the databases fit comfortably in memory with plenty of room
>> for caching so my disk I/0 is virtually nothing.
>>
>> Not quite sure where to start. I'd like to blame conductor
>> for serializing database access, but I really hope any service
>> could handle at least one rack of servers before you needed to
>> scale out...but besides the poor user experience of sluggish
>> response I'm also getting timeouts if I try and start some
>> number of 10's of servers, the usual work flow around here
>> often involves 100's.
>>
>> Anyone had similar problems and/or have suggestions of where
>> else to look for bottle necks.
>>
>> -Jon
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> <mailto:OpenStack-operators at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> <mailto:OpenStack-operators at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
More information about the OpenStack-operators
mailing list