12 Aug
2024
12 Aug
'24
5:01 p.m.
Hi,
Are you sending all the write requests on the same node? If not you should otherwise you will inevitably fall in that case where 2 write requests are coming on 2 different nodes at the same time which is causing deadlock issues.
--
PS
Eugen Block eblock@nde.ag wrote on lun. [2024-août-12 14:15:47 +0000]:
> Just one more note: I see the deadlock messages for all cinder services,
> cinder-api, cinder-scheduler, cinder-backup (which isn't even in use) and
> cinder-volume. nova-api contains those deadlock messages as well, so this
> might be a mariadb/galera issue? I'm not sure yet, I'll try to find out
> more.
>
> Zitat von Eugen Block eblock@nde.ag:
>
> > Hi,
> >
> > in a customer cluster (Victoria, Galera cluster on 3 control nodes)
> > we're seeing failing pipeline deployments from time to time when cinder
> > is instructed to create multiple volumes at once. This is the error
> > message:
> >
> > ---snip---
> > 2024-08-12 15:01:34.762 33307 WARNING oslo_db.sqlalchemy.exc_filters
> > [req-aa5505d3-167a-4096-9311-36b10deebcc1
> > 049f5ea05bd14c019aeab37d3cff4ffc ed22c592548e4903b9af541bb158c6fe - - -]
> > DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This
> > Connection is closed
> > ...
> > 2024-08-12 15:01:34.762 33307 ERROR oslo_db.sqlalchemy.exc_filters
> > pymysql.err.InternalError: (1213, 'Deadlock found when trying to get
> > lock; try restarting transaction')
> > ...
> > 2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server File
> > "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
> > 476, in _revalidate_connection
> > 2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server raise
> > exc.ResourceClosedError("This Connection is closed")
> > 2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server
> > sqlalchemy.exc.DBAPIError: (sqlalchemy.exc.ResourceClosedError) This
> > Connection is closed
> > 2024-08-12 15:01:34.766 33307 ERROR oslo_messaging.rpc.server
> > (Background on this error at: http://sqlalche.me/e/13/dbapi)
> > ---snip---
> >
> > I found this bug [1] with a fix for Pike, so Victoria already has that
> > fix, but the error still blocks some deployments, leaving volumes in
> > "creating" state which has to be cleaned up manually. I can't find much
> > else on this, am I missing something? Any pointers would be highly
> > appreciated!
> >
> > Thanks!
> > Eugen
> >
> > [1] https://bugs.launchpad.net/cinder/+bug/1789106
>
>
>
--
Pierre-Samuel Le Stang