[openstack-dev] [Keystone][Oslo] Caching tokens in auth token middleware
Adam Young
ayoung at redhat.com
Sat Mar 2 03:17:11 UTC 2013
On 03/01/2013 05:59 PM, Jay Pipes wrote:
> On 03/01/2013 01:18 PM, Vishvananda Ishaya wrote:
>> Hi Everyone,
>>
>> So I've been doing some profiling of api calls against devstack and I've discovered that a significant portion of time spent is in the auth_token middleware validating the PKI token. There is code to turn on caching of the token if memcache is enabled, but this seems like overkill in most cases. We should be caching the token in memory by default. Fortunately, nova has some nifty code that will use an in-memory cache if memcached isn't available.
> We gave up on PKI in Folsom after weeks of trouble with it:
It was commited late enough in the Folsom cycle that we didn't feel
comfortable going default with it. I knew that we wouldn't flush out
the bugs, though, until it was the default, which is why it was our
first task in the Grizzly cycle.
>
> * Unstable -- Endpoints would stay up >24 hours but after around 24
> hours (sometimes sooner), the endpoint would stop working properly with
> the server user suddenly returned a 401 when trying to authenticate a
> token. Restarting the endpoint with a service nova-api restart gets rid
> of the 401 Unauthorized for a few hours and then it happens again.
I assume there was no logging specifying what was failing. I can make a
guess, though, that there was some sort of glitch in getting the token
revocation list, and that the list was only fetched at start up.
>
> * Unable to use memcache with PKI. The token was longer than the maximum
> memcache key and resulted in errors on every request. The solution for
> this was to hash the CMS token and use hash as a key in memcache, but
> unfortunately this solution wasn't backported to Folsom Keystone --
> partly I think because the auth_token middleware was split out into the
> keystoneclient during Grizzly.
>
> In any case, the above two things make PKI unusable in Folsom.
>
> We fell back on UUID tokens -- the default in Folsom. Unfortunately,
> there are serious performance issues with this approach as well. Every
> single request to an endpoint results in multiple requests to Keystone,
> which bogs down the system.
That right there was the original reason for the PKI tokens. Any hard
performance data?
>
> In addition to the obvious roundtrip issues, with just 26 users in a
> test cloud, in 3 weeks there are over 300K records in the tokens table
> on a VERY lightly used cloud. Not good. Luckily, we use multi-master
> MySQL replication (Galera) with excellent write rates spread across four
> cluster nodes, but this scale of writes for such a small test cluster is
> worrying to say the least.
Did you consider using the memcached backedn for Tokens?
Memcached has an automated timeout.
>
> Although not related to PKI, I've also noticed that due to the decision
> to use a denormalized schema in the users table with the "extra" column
> storing a JSON-encoded blob of data including the user's default tenant
> and enabled flag is a horrible performance problem. Hope that v3
> Keystone has corrected these issues in the SQL driver.
Normalization was done roughly at the start of the G3 cycle.
>
>> 1) Shim the code into the wsgi stack using the configuration options designed for swift:
>>
>> https://review.openstack.org/23236
>>
>> This is my least favorite option since changing paste config is a pain for deployers and it doesn't help any of the other projects.
> Meh, whether you add options to a config file or a paste INI file it's
> the same pain for deployers :) But generally agree with you.
>
>> 2) Copy the code into keystoneclient:
>>
>> https://review.openstack.org/23307
>>
>> 3) Move memorycache into oslo and sync it to nova and keystoneclient:
>>
>> https://review.openstack.org/23306
>> https://review.openstack.org/23308
>> https://review.openstack.org/23309
>>
>> I think 3) is the right long term move, but I'm not sure if this appropriate considering how close we are to the grizzly release, so if we want to do 2) immediately and postpone 3) until H, that is fine with me.
> Well, I think 3) is the right thing to do in any case, and can be done
> in oslo regardless of Nova's RC status.
>
> Not sure that 2) is really all that useful. If you are in any serious
> production environment, you're going to be using memcached anyway.
>
> Best,
> -jay
I'm all for 3. I thought this was underway already. I assume the Oslo
folks have no reservations?
>
>> Thoughts?
>>
>> Vish
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list