[openstack-dev] [Keystone][Oslo] Caching tokens in auth token middleware

Jay Pipes jaypipes at gmail.com
Mon Mar 4 17:04:49 UTC 2013


On 03/01/2013 10:17 PM, Adam Young wrote:
> On 03/01/2013 05:59 PM, Jay Pipes wrote:
>> On 03/01/2013 01:18 PM, Vishvananda Ishaya wrote:
>>> Hi Everyone,
>>>
>>> So I've been doing some profiling of api calls against devstack and I've discovered that a significant portion of time spent is in the auth_token middleware validating the PKI token. There is code to turn on caching of the token if memcache is enabled, but this seems like overkill in most cases. We should be caching the token in memory by default. Fortunately, nova has some nifty code that will use an in-memory cache if memcached isn't available.
>> We gave up on PKI in Folsom after weeks of trouble with it:
> 
> It was commited late enough in the Folsom cycle that we didn't feel 
> comfortable going default with it.  I knew that we wouldn't flush out 
> the bugs, though, until it was the default, which is why it was our 
> first task in the Grizzly cycle.

Understood, and I believe I was clear in my post that I was specifically
referring to Folsom :)

>> * Unstable -- Endpoints would stay up >24 hours but after around 24
>> hours (sometimes sooner), the endpoint would stop working properly with
>> the server user suddenly returned a 401 when trying to authenticate a
>> token. Restarting the endpoint with a service nova-api restart gets rid
>> of the 401 Unauthorized for a few hours and then it happens again.
> I assume there was no logging specifying what was failing.  I can make a 
> guess, though, that there was some sort of glitch in getting the token 
> revocation list, and that the list was only fetched at start up.

Indeed, no logging other than the return of the 401. I will say that it
would have been much easier to determine that it was the *service* user
that was getting a 401 and not the authenticating user if something like
that was in the debug log message! :)

>> * Unable to use memcache with PKI. The token was longer than the maximum
>> memcache key and resulted in errors on every request. The solution for
>> this was to hash the CMS token and use hash as a key in memcache, but
>> unfortunately this solution wasn't backported to Folsom Keystone --
>> partly I think because the auth_token middleware was split out into the
>> keystoneclient during Grizzly.
>>
>> In any case, the above two things make PKI unusable in Folsom.
>>
>> We fell back on UUID tokens -- the default in Folsom. Unfortunately,
>> there are serious performance issues with this approach as well. Every
>> single request to an endpoint results in multiple requests to Keystone,
>> which bogs down the system.
> That right there was the original reason for the PKI tokens.  Any hard 
> performance data?

Nothing hard, no. Doing things via keystone CLI tool are noticeably
slower, but I have not had the time to do any benchmarks. Deployments
and w$rk has a habit of getting in the way of that ;)

>> In addition to the obvious roundtrip issues, with just 26 users in a
>> test cloud, in 3 weeks there are over 300K records in the tokens table
>> on a VERY lightly used cloud. Not good. Luckily, we use multi-master
>> MySQL replication (Galera) with excellent write rates spread across four
>> cluster nodes, but this scale of writes for such a small test cluster is
>> worrying to say the least.
> Did you consider using the memcached backedn for Tokens?
> Memcached has an automated timeout.

We did not, no. More likely we will be on Grizzly before long, so I will
be prototyping PKI + memcache soon enough. Not sure I'll have time to
change this before we are on Grizzly (which is likely a good thing ;)

>> Although not related to PKI, I've also noticed that due to the decision
>> to use a denormalized schema in the users table with the "extra" column
>> storing a JSON-encoded blob of data including the user's default tenant
>> and enabled flag is a horrible performance problem. Hope that v3
>> Keystone has corrected these issues in the SQL driver.
> Normalization was done roughly at the start of the G3 cycle.

All good news.

Best,
-jay



More information about the OpenStack-dev mailing list