Re: [cinder] Deadlock found when trying to get lock; try restarting transaction

12 Aug 2024


      Oh I think you’re right, I did some tests when they had severe sql  
issues but I might have forgotten to turn the load balancing off when  
the performance was restored. Thanks for your quick response, I’ll  
check it later.


Zitat von Pierre-Samuel LE STANG <pierre-samuel.le-stang@ovhcloud.com>:
...
Hi,
Are you sending all the write requests on the same node? If not you
should otherwise you will inevitably fall in that case where 2 write
requests are coming on 2 different nodes at the same time which is
causing deadlock issues.
--
PS
Eugen Block <eblock@nde.ag> wrote on lun. [2024-août-12 14:15:47 +0000]:
...
Just one more note: I see the deadlock messages for all cinder services,
cinder-api, cinder-scheduler, cinder-backup (which isn't even in use) and
cinder-volume. nova-api contains those deadlock messages as well, so this
might be a mariadb/galera issue? I'm not sure yet, I'll try to find out
more.
Zitat von Eugen Block <eblock@nde.ag>:
...
Hi,
in a customer cluster (Victoria, Galera cluster on 3 control nodes)
we're seeing failing pipeline deployments from time to time when cinder
is instructed to create multiple volumes at once. This is the error
message:
---snip---
2024-08-12 15:01:34.762 33307 WARNING oslo_db.sqlalchemy.exc_filters
[req-aa5505d3-167a-4096-9311-36b10deebcc1
049f5ea05bd14c019aeab37d3cff4ffc ed22c592548e4903b9af541bb158c6fe - - -]
DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This
Connection is closed
...
2024-08-12 15:01:34.762 33307 ERROR oslo_db.sqlalchemy.exc_filters
pymysql.err.InternalError: (1213, 'Deadlock found when trying to get
lock; try restarting transaction')
...
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server   File
"/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
476, in _revalidate_connection
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server     raise
exc.ResourceClosedError("This Connection is closed")
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server
sqlalchemy.exc.DBAPIError: (sqlalchemy.exc.ResourceClosedError) This
Connection is closed
2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server
(Background on this error at: http://sqlalche.me/e/13/dbapi)
---snip---
I found this bug [1] with a fix for Pike, so Victoria already has that
fix, but the error still blocks some deployments, leaving volumes in
"creating" state which has to be cleaned up manually. I can't find much
else on this, am I missing something? Any pointers would be highly
appreciated!
Thanks!
Eugen
[1] https://bugs.launchpad.net/cinder/+bug/1789106
--
Pierre-Samuel Le Stang