[openstack-dev] [Keystone] Best way to do something MySQL-specific?

Clint Byrum clint at fewbar.com
Tue Jul 9 05:31:25 UTC 2013


On Jul 8, 2013, at 20:34, Jamie Lennox <jlennox at redhat.com> wrote:

> On Mon, 2013-07-08 at 21:55 -0400, Adam Young wrote:
>> 
>> Tokens are, for the most part, immutable.  Once they are written, they
>> don't change except if they get revoked.  This is a fairly rare
>> occurance, but it does happen.
>> 
>> Deleting tokens based on age should be fairly straight forward, and
>> locks should not need to be held for a significant amount of time.
>> 
>> My guess, however, is that the problem is SQL Alchemy:
>> 
>>        query = session.query(TokenModel)
>>        query = query.filter(TokenModel.expires < timeutils.utcnow())
>>        query.delete(synchronize_session=False)
>> 
>> If it is doing a fetch and then the delete, then the rows would be
>> held for a short period of time.
>> 
>> Direct SQL might be a better approach:  prepare a statement:
>> 
>> "delete from token where expires < $1"
> 
> Sqlalchemy already generates this statement.  
> 
>> and then bind and execute in one command.
>> 
>> However, it seems to me that the conflict detection is the problem.  I
>> don't know if there is a way to state "ignore any future queries that
>> would match this criteria."   It does seem to me that even doing this
>> degree of conflict detection is somewhat violating the principal of
>> Isolation.
>> 
>> There might be an approach using table partitioning as well, where you
>> only write to partition one, and delete from partition 2, and then
>> swap.
>> 
>> 
>> 
>> 
>> 
>> On 07/08/2013 09:13 PM, Robert Collins wrote:
>> 
>>> On 9 July 2013 12:32, Adam Young <ayoung at redhat.com> wrote:
>>> 
>>>                * I am asking about MySQL.. presumably a "real"
>>>                database.
>>>         I have to admit I am a bit of a Postgresql Bigot. I don't
>>>        really consider MySQL a real database, althought it has
>>>        improved a lot over the years.  I am not up to speed
>>>        on"InnoDB's gap locking behavior" but it is not something I
>>>        would expect to be a problem in Postgresql.
>>> 
>>> PostgreSQL has similar but different characteristics, particular the
>>> latest iteration of isolation behaviour where locks are held on *the
>>> result of a query*, not on 'specific rows returned' - the difference
>>> being that adding a new row that matches the query for rows to
>>> delete, would encounter a conflict. You also need to delete small
>>> numbers of rows at a time, though the reason in the plumbing is
>>> different. There are some nasty interlocks you can cause with very
>>> large deletes and autovacuum too - if you trigger deadlock detection
>>> it still takes /minutes/ to detect and cleanup, whereas we want
>>> sub-second liveness.
>>> 
>>>        once every second would be strange indeed.  I would think
>>>        maybe once every five minutes or so.  Schedule your clean up
>>>        IAW your deployment and usage.
>>> 
>>> 5m intervals exacerbate the issue until it's solved. If the cleanup
>>> deletes no more than (say) 1000 rows per iteration, it could run
>>> every 5 minutes but when run keep going until the db is cleaned.
>>> 
>>>        Deleting a chunk of tokens in bulk would be preferable to
>>>        doing client side iteration, I can;t see how that would not
>>>        be the case.
>>> 
>>> right, so I think Clint prefers that too, the question is how to get
>>> sqlalchemy to output the appropriate sql for postgresql and mysql,
>>> which is different.
> 
> I'm not experienced with large databases but i wrote the token_flush so
> i'm interested. What am i missing that we can't just add a --limit
> parameter to the command line tool so "keystone-manage token_flush
> --limit=1000" which is (as was mentioned) deleting 1000 in a command. 
> 
> SA will be able to generate a nested query like:
> "delete from token where id in (select id from token limit 1000)" 
> for all databases. What sort of hit is something nested like that? Then
> you tweak the limit and the frequency but i would suggest 1000 every 5
> minutes should result in a net negative token count and wouldn't come
> close to the locks. 
> 

That is exactly what I tried. MySQL doesn't allow LIMIT on sub queries which feed into the IN() construct. :-(

MySQL does have an SQL extension which allows LIMIT in the DELETE statement, but this presents the challenge I originally was asking about.


More information about the OpenStack-dev mailing list