[openstack-dev] [keystone] token revocation woes

Dolph Mathews dolph.mathews at gmail.com
Mon Jul 27 16:12:40 UTC 2015


Adam Young shared a patch to convert the tree back to a linear list:

  https://review.openstack.org/#/c/205266/

This shouldn't be merged without benchmarking as it's purely a
performance-oriented change.

On Thu, Jul 23, 2015 at 11:40 AM, Matt Fischer <matt at mattfischer.com> wrote:

> Morgan asked me to post some of my numbers here. From my staging
> environment:
>
> With 0 revocations:
> Requests per second:    104.67 [#/sec] (mean)
> Time per request:       191.071 [ms] (mean)
>
> With 500 revocations:
> Requests per second:    52.48 [#/sec] (mean)
> Time per request:       381.103 [ms] (mean)
>
> I have some more numbers up on my blog post about this but that's from a
> virtual test environment and focused on throughput.
>
> Thanks for the attention on this.
>
> On Thu, Jul 23, 2015 at 8:45 AM, Lance Bragstad <lbragstad at gmail.com>
> wrote:
>
>>
>> On Wed, Jul 22, 2015 at 10:06 PM, Adam Young <ayoung at redhat.com> wrote:
>>
>>>  On 07/22/2015 05:39 PM, Adam Young wrote:
>>>
>>> On 07/22/2015 03:41 PM, Morgan Fainberg wrote:
>>>
>>> This is an indicator that the bottleneck is not the db strictly
>>> speaking, but also related to the way we match. This means we need to spend
>>> some serious cycles on improving both the stored record(s) for revocation
>>> events and the matching algorithm.
>>>
>>>
>>> The simplest approach to revocation checking is to do a linear search
>>> through the events.  I think the old version of the code that did that is
>>> in a code review, and I will pull it out.
>>>
>>> If we remove the tree, then the matching will have to run through each
>>> of the records and see if there is a match;  the test will be linear with
>>> the number of records (slightly shorter if a token is actually revoked).
>>>
>>>
>>> This was the origianal, linear search version of the code.
>>>
>>>
>>> https://review.openstack.org/#/c/55908/50/keystone/contrib/revoke/model.py,cm
>>>
>>>
>>>
>> What initially landed for Revocation Events was the tree-structure,
>> right? We didn't land a linear approach prior to that and then switch to
>> the tree, did we?
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Sent via mobile
>>>
>>> On Jul 22, 2015, at 11:51, Matt Fischer <matt at mattfischer.com> wrote:
>>>
>>>   Dolph,
>>>
>>>  Per our IRC discussion, I was unable to see any performance
>>> improvement here although not calling DELETE so often will reduce the
>>> number of deadlocks when we're under heavy load especially given the
>>> globally replicated DB we use.
>>>
>>>
>>>
>>> On Tue, Jul 21, 2015 at 5:26 PM, Dolph Mathews <dolph.mathews at gmail.com>
>>> wrote:
>>>
>>>> Well, you might be in luck! Morgan Fainberg actually implemented an
>>>> improvement that was apparently documented by Adam Young way back in
>>>> March:
>>>>
>>>>   https://bugs.launchpad.net/keystone/+bug/1287757
>>>>
>>>>  There's a link to the stable/kilo backport in comment #2 - I'd be
>>>> eager to hear how it performs for you!
>>>>
>>>> On Tue, Jul 21, 2015 at 5:58 PM, Matt Fischer <matt at mattfischer.com>
>>>> wrote:
>>>>
>>>>>  Dolph,
>>>>>
>>>>>  Excuse the delayed reply, was waiting for a brilliant solution from
>>>>> someone. Without one, personally I'd prefer the cronjob as it seems to be
>>>>> the type of thing cron was designed for. That will be a painful change as
>>>>> people now rely on this behavior so I don't know if its feasible. I will be
>>>>> setting up monitoring for the revocation count and alerting me if it
>>>>> crosses probably 500 or so. If the problem gets worse then I think a custom
>>>>> no-op or sql driver is the next step.
>>>>>
>>>>>  Thanks.
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2015 at 4:00 PM, Dolph Mathews <
>>>>> dolph.mathews at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 15, 2015 at 4:51 PM, Matt Fischer <matt at mattfischer.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm having some issues with keystone revocation events. The bottom
>>>>>>> line is that due to the way keystone handles the clean-up of these
>>>>>>> events[1], having more than a few leads to:
>>>>>>>
>>>>>>>   - bad performance, up to 2x slower token validation with about
>>>>>>> 600 events based on my perf measurements.
>>>>>>>  - database deadlocks, which cause API calls to fail, more likely
>>>>>>> with more events it seems
>>>>>>>
>>>>>>>  I am seeing this behavior in code from trunk on June 11 using
>>>>>>> Fernet tokens, but the token backend does not seem to make a difference.
>>>>>>>
>>>>>>>  Here's what happens to the db in terms of deadlock:
>>>>>>> 2015-07-15 21:25:41.082 31800 TRACE keystone.common.wsgi DBDeadlock:
>>>>>>> (OperationalError) (1213, 'Deadlock found when trying to get lock; try
>>>>>>> restarting transaction') 'DELETE FROM revocation_event WHERE
>>>>>>> revocation_event.revoked_at < %s' (datetime.datetime(2015, 7, 15, 18, 55,
>>>>>>> 41, 55186),)
>>>>>>>
>>>>>>>  When this starts happening, I just go truncate the table, but this
>>>>>>> is not ideal. If [1] is really true then the design is not great, it sounds
>>>>>>> like keystone is doing a revocation event clean-up on every token
>>>>>>> validation call. Reading and deleting/locking from my db cluster is not
>>>>>>> something I want to do on every validate call.
>>>>>>>
>>>>>>
>>>>>>  Unfortunately, that's *exactly* what keystone is doing. Adam and I
>>>>>> had a conversation about this problem in Vancouver which directly resulted
>>>>>> in opening the bug referenced on the operator list:
>>>>>>
>>>>>>   https://bugs.launchpad.net/keystone/+bug/1456797
>>>>>>
>>>>>>  Neither of us remembered the actual implemented behavior, which is
>>>>>> what you've run into and Deepti verified in the bug's comments.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  So, can I turn of token revocation for now? I didn't see an
>>>>>>> obvious no-op driver.
>>>>>>>
>>>>>>
>>>>>>  Not sure how, other than writing your own no-op driver, or perhaps
>>>>>> an extended driver that doesn't try to clean the table on every read?
>>>>>>
>>>>>>
>>>>>>>  And in the long-run can this be fixed? I'd rather do almost
>>>>>>> anything else, including writing a cronjob than what happens now.
>>>>>>>
>>>>>>
>>>>>>  If anyone has a better solution than the current one, that's also
>>>>>> better than requiring a cron job on something like keystone-manage
>>>>>> revocation_flush I'd love to hear it.
>>>>>>
>>>>>>
>>>>>>>  [1] -
>>>>>>> http://lists.openstack.org/pipermail/openstack-operators/2015-June/007210.html
>>>>>>>
>>>>>>>
>>>>>>> __________________________________________________________________________
>>>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>>>> Unsubscribe:
>>>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> __________________________________________________________________________
>>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>>> Unsubscribe:
>>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> __________________________________________________________________________
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe:
>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org
>>> ?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150727/4b389f78/attachment.html>


More information about the OpenStack-dev mailing list