[nova] VM gets stuck in BUILD status after SQL DBConnectionError error in nova-conductor
Pavlo Bychikhin
pbychikhin at mirantis.com
Thu Aug 10 18:49:13 UTC 2023
Hi all,
Sometimes when I create a bunch of VMs (40) at once, one of them gets
stuck in BUILD status.
From logs I found that this happens due to SQL error in nova-conductor:
DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost
connection to MySQL server during query').
There is an attempt to schedule the VM:
....
nova-scheduler"sending reply msg_id: 593a990393d5439580870d8412ab49f3
reply queue: reply_a0992fe3f9924cb1b45bfa8b27057580 time elapsed:
1.49198414385s"
And right after that the SQL error happens:
nova-conductor "Exception during message handling" "DBConnectionError:
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server
during query') [SQL: u'SELECT instance_id_mappings.created_at AS
instance_id_mappings_created ......"
And after that the VM stays in BUILD status forever.
My setup is Galera cluster behind Haproxy load balancer.
I checked various DB server timeouts - they are totally fine. Ones in
MySQL correspond ones in nova.conf and haproxy.cfg.
In the MySQL logs there are messages like:
[Warning] Aborted connection 17798924 to db: 'nova' user: 'nova' host:
'xxx.xxx.xxx.xxx' (Got an error reading communication packets)
In the Haproxy logs there are messages like:
<134>2023-08-05T08:44:34.325338+00:00 dbs01 haproxy[1275]:
xxx.xxx.xxx.xxx:44880 [05/Aug/2023:08:42:44.441] mysql_cluster_nova_conn
mysql_cluster_nova_conn/dbs02 1/0/109884 11438 CD 4125/1884/1884/1884/0 0/0
"CD" means "The client unexpectedly aborted during data transfer"
So the Haproxy believes the connection is closed at client side. I
checked the network interface counters - they are also fine (no errors,
drops etc).
I found some discussions in the Internet. People say, when the DB is
behind load balancer - such 'Lost connection' events are pretty normal
and not harmful to the Openstack components.
I would agree with them. I think, Nova should be able to recover from
accidental DB connection lost. Or to set the ERROR status to the VM if
that's impossible.
My Nova version is 17.0.13 (Queens).
May be someone encountered such issue and knows how to fix?
Or that's a bug in Nova and I need to upgrade it to a more recent version?
Thanks in advance
----
Best regards,
Pavlo Bychikhin
More information about the openstack-discuss
mailing list