[openstack-dev] [Keystone][Oslo] Caching tokens in auth token middleware

Russell Bryant rbryant at redhat.com
Fri Mar 1 23:32:27 UTC 2013


On 03/01/2013 05:59 PM, Jay Pipes wrote:
> On 03/01/2013 01:18 PM, Vishvananda Ishaya wrote:
>> Hi Everyone,
>>
>> So I've been doing some profiling of api calls against devstack and I've discovered that a significant portion of time spent is in the auth_token middleware validating the PKI token. There is code to turn on caching of the token if memcache is enabled, but this seems like overkill in most cases. We should be caching the token in memory by default. Fortunately, nova has some nifty code that will use an in-memory cache if memcached isn't available.
> 
> We gave up on PKI in Folsom after weeks of trouble with it:
> 
> * Unstable -- Endpoints would stay up >24 hours but after around 24
> hours (sometimes sooner), the endpoint would stop working properly with
> the server user suddenly returned a 401 when trying to authenticate a
> token. Restarting the endpoint with a service nova-api restart gets rid
> of the 401 Unauthorized for a few hours and then it happens again.
> 
> * Unable to use memcache with PKI. The token was longer than the maximum
> memcache key and resulted in errors on every request. The solution for
> this was to hash the CMS token and use hash as a key in memcache, but
> unfortunately this solution wasn't backported to Folsom Keystone --
> partly I think because the auth_token middleware was split out into the
> keystoneclient during Grizzly.
> 
> In any case, the above two things make PKI unusable in Folsom.
> 
> We fell back on UUID tokens -- the default in Folsom. Unfortunately,
> there are serious performance issues with this approach as well. Every
> single request to an endpoint results in multiple requests to Keystone,
> which bogs down the system.
> 
> In addition to the obvious roundtrip issues, with just 26 users in a
> test cloud, in 3 weeks there are over 300K records in the tokens table
> on a VERY lightly used cloud. Not good. Luckily, we use multi-master
> MySQL replication (Galera) with excellent write rates spread across four
> cluster nodes, but this scale of writes for such a small test cluster is
> worrying to say the least.
> 
> Although not related to PKI, I've also noticed that due to the decision
> to use a denormalized schema in the users table with the "extra" column
> storing a JSON-encoded blob of data including the user's default tenant
> and enabled flag is a horrible performance problem. Hope that v3
> Keystone has corrected these issues in the SQL driver.

This is really interesting feedback.  Thanks for writing it up.

>>
>> 1) Shim the code into the wsgi stack using the configuration options designed for swift:
>>
>> https://review.openstack.org/23236
>>
>> This is my least favorite option since changing paste config is a pain for deployers and it doesn't help any of the other projects.
> 
> Meh, whether you add options to a config file or a paste INI file it's
> the same pain for deployers :) But generally agree with you.
> 
>> 2) Copy the code into keystoneclient:
>>
>> https://review.openstack.org/23307
>>
>> 3) Move memorycache into oslo and sync it to nova and keystoneclient:
>>
>> https://review.openstack.org/23306
>> https://review.openstack.org/23308
>> https://review.openstack.org/23309
>>
>> I think 3) is the right long term move, but I'm not sure if this appropriate considering how close we are to the grizzly release, so if we want to do 2) immediately and postpone 3) until H, that is fine with me.
> 
> Well, I think 3) is the right thing to do in any case, and can be done
> in oslo regardless of Nova's RC status.
> 
> Not sure that 2) is really all that useful. If you are in any serious
> production environment, you're going to be using memcached anyway.

+1 that 3 is ideal.  I think this should have been done with a FFE for
Oslo.  It got merged in Oslo already anyway, though ...

-- 
Russell Bryant



More information about the OpenStack-dev mailing list