[cinder] Deadlock found when trying to get lock; try restarting transaction

12 Aug 2024

      Hi,

in a customer cluster (Victoria, Galera cluster on 3 control nodes)  
we're seeing failing pipeline deployments from time to time when  
cinder is instructed to create multiple volumes at once. This is the  
error message:

---snip---
2024-08-12 15:01:34.762 33307 WARNING oslo_db.sqlalchemy.exc_filters  
[req-aa5505d3-167a-4096-9311-36b10deebcc1  
049f5ea05bd14c019aeab37d3cff4ffc ed22c592548e4903b9af541bb158c6fe - -  
-] DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This  
Connection is closed
...
2024-08-12 15:01:34.762 33307 ERROR oslo_db.sqlalchemy.exc_filters  
pymysql.err.InternalError: (1213, 'Deadlock found when trying to get  
lock; try restarting transaction')
...
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server   File  
"/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line  
476, in _revalidate_connection
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server      
raise exc.ResourceClosedError("This Connection is closed")
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server  
sqlalchemy.exc.DBAPIError: (sqlalchemy.exc.ResourceClosedError) This  
Connection is closed
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server  
(Background on this error at: http://sqlalche.me/e/13/dbapi)
---snip---

I found this bug [1] with a fix for Pike, so Victoria already has that  
fix, but the error still blocks some deployments, leaving volumes in  
"creating" state which has to be cleaned up manually. I can't find  
much else on this, am I missing something? Any pointers would be  
highly appreciated!

Thanks!
Eugen

[1] https://bugs.launchpad.net/cinder/+bug/1789106

Eugen Block

Eugen Block

Pierre-Samuel LE STANG

Eugen Block

smooney＠redhat.com

Eugen Block

tags

participants (3)