[openstack-dev] [Keystone] Best way to do something MySQL-specific?
Clint Byrum
clint at fewbar.com
Tue Jul 9 05:31:25 UTC 2013
On Jul 8, 2013, at 20:34, Jamie Lennox <jlennox at redhat.com> wrote:
> On Mon, 2013-07-08 at 21:55 -0400, Adam Young wrote:
>>
>> Tokens are, for the most part, immutable. Once they are written, they
>> don't change except if they get revoked. This is a fairly rare
>> occurance, but it does happen.
>>
>> Deleting tokens based on age should be fairly straight forward, and
>> locks should not need to be held for a significant amount of time.
>>
>> My guess, however, is that the problem is SQL Alchemy:
>>
>> query = session.query(TokenModel)
>> query = query.filter(TokenModel.expires < timeutils.utcnow())
>> query.delete(synchronize_session=False)
>>
>> If it is doing a fetch and then the delete, then the rows would be
>> held for a short period of time.
>>
>> Direct SQL might be a better approach: prepare a statement:
>>
>> "delete from token where expires < $1"
>
> Sqlalchemy already generates this statement.
>
>> and then bind and execute in one command.
>>
>> However, it seems to me that the conflict detection is the problem. I
>> don't know if there is a way to state "ignore any future queries that
>> would match this criteria." It does seem to me that even doing this
>> degree of conflict detection is somewhat violating the principal of
>> Isolation.
>>
>> There might be an approach using table partitioning as well, where you
>> only write to partition one, and delete from partition 2, and then
>> swap.
>>
>>
>>
>>
>>
>> On 07/08/2013 09:13 PM, Robert Collins wrote:
>>
>>> On 9 July 2013 12:32, Adam Young <ayoung at redhat.com> wrote:
>>>
>>> * I am asking about MySQL.. presumably a "real"
>>> database.
>>> I have to admit I am a bit of a Postgresql Bigot. I don't
>>> really consider MySQL a real database, althought it has
>>> improved a lot over the years. I am not up to speed
>>> on"InnoDB's gap locking behavior" but it is not something I
>>> would expect to be a problem in Postgresql.
>>>
>>> PostgreSQL has similar but different characteristics, particular the
>>> latest iteration of isolation behaviour where locks are held on *the
>>> result of a query*, not on 'specific rows returned' - the difference
>>> being that adding a new row that matches the query for rows to
>>> delete, would encounter a conflict. You also need to delete small
>>> numbers of rows at a time, though the reason in the plumbing is
>>> different. There are some nasty interlocks you can cause with very
>>> large deletes and autovacuum too - if you trigger deadlock detection
>>> it still takes /minutes/ to detect and cleanup, whereas we want
>>> sub-second liveness.
>>>
>>> once every second would be strange indeed. I would think
>>> maybe once every five minutes or so. Schedule your clean up
>>> IAW your deployment and usage.
>>>
>>> 5m intervals exacerbate the issue until it's solved. If the cleanup
>>> deletes no more than (say) 1000 rows per iteration, it could run
>>> every 5 minutes but when run keep going until the db is cleaned.
>>>
>>> Deleting a chunk of tokens in bulk would be preferable to
>>> doing client side iteration, I can;t see how that would not
>>> be the case.
>>>
>>> right, so I think Clint prefers that too, the question is how to get
>>> sqlalchemy to output the appropriate sql for postgresql and mysql,
>>> which is different.
>
> I'm not experienced with large databases but i wrote the token_flush so
> i'm interested. What am i missing that we can't just add a --limit
> parameter to the command line tool so "keystone-manage token_flush
> --limit=1000" which is (as was mentioned) deleting 1000 in a command.
>
> SA will be able to generate a nested query like:
> "delete from token where id in (select id from token limit 1000)"
> for all databases. What sort of hit is something nested like that? Then
> you tweak the limit and the frequency but i would suggest 1000 every 5
> minutes should result in a net negative token count and wouldn't come
> close to the locks.
>
That is exactly what I tried. MySQL doesn't allow LIMIT on sub queries which feed into the IN() construct. :-(
MySQL does have an SQL extension which allows LIMIT in the DELETE statement, but this presents the challenge I originally was asking about.
More information about the OpenStack-dev
mailing list