[openstack-dev] [TaskFlow] TaskFlow persistence

Joshua Harlow harlowja at fastmail.com
Thu Mar 24 00:10:22 UTC 2016


On 03/23/2016 12:49 PM, pnkk wrote:
> Joshua,
>
> We are performing few scaling tests for our solution and see that there
> are errors as below:
>
> Failed saving logbook 'cc6f5cbd-c2f7-4432-9ca6-fff185cf853b'\n  InternalError: (pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') [SQL: u'UPDATE logbooks SET created_at=%s, updated_at=%s, meta=%s, name=%s, uuid=%s WHERE logbooks.uuid = %s'] [parameters: (datetime.datetime(2016, 3, 18, 18, 16, 40), datetime.datetime(2016, 3, 23, 3, 3, 44, 95395), u'{}', u'test', u'cc6f5cbd-c2f7-4432-9ca6-fff185cf853b', u'cc6f5cbd-c2f7-4432-9ca6-fff185cf853b')]"
>
>
> We have about 800 flows as of now and each flow is updated in the same logbook in a separate eventlet thread.
>
>
> Every thread calls save_logbook() on the same logbook record. I think this function is trying to update logbook record even though my usecase needs only flow details to be inserted and it doesn't update any information related to logbook.
>

Right its trying to update the 'updated_at' field afaik,

>
> Probably one of the threads was holding the lock while updating, and others tried for lock and failed after the default interval has elapsed.
>
>
> I can think of few alternatives at the moment:
>
>
> 1. Increase the number of logbooks
>
> 2. Increase theinnodb_lock_wait_timeout
>
> 3. There are some suggestions to make the innodb transaction isolation level to "READ COMMITTED" instead of "REPEATABLE READ", but I am not very familiar of the side effects they can cause

4. Add some basic retries?

5. The following review should also help (and save less) @ 
https://review.openstack.org/#/c/241441/

Afaik we are also using READ COMMITTED already ;)

https://github.com/openstack/taskflow/blob/master/taskflow/persistence/backends/impl_sqlalchemy.py#L105

>
>
> Appreciate your thoughts on given alternatives or probably even better alternative

Do u want to try using https://pypi.python.org/pypi/retrying in a few 
strategic places so that if the above occurs, that it retries?

>
>
> Thanks,
>
> Kanthi
>
>
>
>
>
>
> On Sun, Mar 20, 2016 at 10:00 PM, Joshua Harlow <harlowja at fastmail.com
> <mailto:harlowja at fastmail.com>> wrote:
>
>     Lingxian Kong wrote:
>
>         Kanthi, sorry for chiming in, I suggest you may have a chance to
>         take
>         a look at Mistral[1], which is the workflow as a service in
>         OpenStack(or without OpenStack).
>
>
>     Out of curiosity, why? Seems the ML post was about 'TaskFlow
>     persistence' not mistral, just saying (unsure how it is relevant to
>     mention mistral in this)...
>
>     Back to getting more coffee...
>
>     -Josh
>
>
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list