[Openstack-operators] Very Slow API operations after Grizzly upgrade.

Aubrey Wells aubrey at vocalcloud.com
Thu Aug 15 20:37:09 UTC 2013


I've noticed in dev (where i dont have it scripted) that when i delete them
by hand it can take 5 or 10 minutes before commands start responding
properly, I assume because of queues up requests from normal internal
operations of the cluster. YMMV of course.

------------------
Aubrey Wells
Director | Network Services
VocalCloud
888.305.3850
support at vocalcloud.com
www.vocalcloud.com


On Thu, Aug 15, 2013 at 4:32 PM, Jonathan Proulx <jon at jonproulx.com> wrote:

> That's good to know about, but I don't think it was my issue (yet, so
> thanks for saving me from it as I would have been on vacation when it was
> likely to hit hard...)
>
> I *only* had 58k expired tokens after deleting those I'm still getting
> 30-60 second times for nova list, though the keystone response sped up to a
> more reasonable 4-6sec
>
> I do think conductor was killing me, spun up six tiny instances running
> nova-conductor on the cloud and now nova lists are back down to 6 seconds
> which is what they were before, and I can blame on the instances table
> needing a good cleaning.
>
>
>
> On Thu, Aug 15, 2013 at 11:09 AM, Lorin Hochstein <
> lorin at nimbisservices.com> wrote:
>
>> Yes, it's the database token issue:
>>
>>
>> https://ask.openstack.org/en/question/1740/keystone-never-delete-expires-token-in-database/
>> https://bugs.launchpad.net/ubuntu/+source/keystone/+bug/1032633
>>
>>
>> If you don't need PKI tokens, you can configure keystone for uuid tokens
>> with the memcache backend instead: <
>> http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=%2Fcom.ibm.sco.doc_2.2%2Ft_memcached_keystone.html<http://pic.dhe.ibm.com/infocenter/tivihelp/v48r1/index.jsp?topic=/com.ibm.sco.doc_2.2/t_memcached_keystone.html>
>> >
>>
>> If you want to use the PKI tokens, then you'll need to set up a cron job
>> to clear out the old tokens from the database. There's a "keystone-manage
>> token flush"  command coming in havana so that this won't require raw SQL
>> to do: <https://review.openstack.org/#/c/28133/>
>>
>> You can also speed up the query by setting a database index on the
>> "valid" column of the token table. This has been done for havana:  <
>> https://review.openstack.org/#/c/30753/>
>>
>>  Take care,
>>
>> Lorin
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>>
>>
>>
>>
>> On Aug 15, 2013, at 10:53 AM, Aubrey Wells <aubrey at vocalcloud.com> wrote:
>>
>> We have the same thing and found that the keystone tokens table had
>> hundreds of thousands of expired tokens in it so the SELECT that gets done
>> during the auth phase of API operations was taking ages to return. Wrote a
>> script to clean up expired tokens and it hasn't recurred. A quick and dirty
>> version to clean it up by hand would be 'delete from token where expires <
>> NOW();' but you might want something a little safer in an automated script.
>>
>> ------------------
>> Aubrey Wells
>> Director | Network Services
>> VocalCloud
>> 888.305.3850
>> support at vocalcloud.com
>> www.vocalcloud.com
>>
>>
>> On Thu, Aug 15, 2013 at 10:45 AM, Jonathan Proulx <jon at jonproulx.com>wrote:
>>
>>> Hi All,
>>>
>>> I have a single controller node 60 compute node cloud on Ubuntu 12.04 /
>>> cloud archive and after upgrade to grizzly  everything seem painfully slow.
>>>
>>> I've had 'nova list' take on the order of one minute to return (there's
>>> 65 non-deleted instances and a total of just under 500k instances in the
>>> instances table but that was true before upgrade as well)
>>>
>>> The controller node is 4x busier with this tiny load of a single user
>>> and a few VMs as it has averaged in production with 1,500 VMs dozens of
>>> users and VMs starting every 6sec on average.
>>>
>>> This has me a little worried but the system is so over spec'ed that I
>>> can't see it as my current problem as the previous average was 5% CPU
>>> utilization so now I'm only at 20%.  All the databases fit comfortably in
>>> memory with plenty of room for caching so my disk I/0 is virtually nothing.
>>>
>>> Not quite sure where to start.  I'd like to blame conductor for
>>> serializing database access, but I really hope any service could handle at
>>> least one rack of servers before you needed to scale out...but besides the
>>> poor user experience of sluggish response I'm also getting timeouts if I
>>> try and start some number of 10's of servers, the usual work flow around
>>> here often involves 100's.
>>>
>>> Anyone had similar problems and/or have suggestions of where else to
>>> look for bottle necks.
>>>
>>> -Jon
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130815/8f73ba4e/attachment.html>


More information about the OpenStack-operators mailing list