[nova] VM gets stuck in BUILD status after SQL DBConnectionError error in nova-conductor

Pavlo Bychikhin pbychikhin at mirantis.com
Thu Aug 10 18:49:13 UTC 2023


Hi all,

Sometimes when I create a bunch of VMs (40) at once, one of them gets 
stuck in BUILD status.

 From logs I found that this happens due to SQL error in nova-conductor: 
DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost 
connection to MySQL server during query').

There is an attempt to schedule the VM:

....

nova-scheduler"sending reply msg_id: 593a990393d5439580870d8412ab49f3 
reply queue: reply_a0992fe3f9924cb1b45bfa8b27057580 time elapsed: 
1.49198414385s"

And right after that the SQL error happens:

nova-conductor "Exception during message handling" "DBConnectionError: 
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server 
during query') [SQL: u'SELECT instance_id_mappings.created_at AS 
instance_id_mappings_created ......"

And after that the VM stays in BUILD status forever.


My setup is Galera cluster behind Haproxy load balancer.

I checked various DB server timeouts - they are totally fine. Ones in 
MySQL correspond ones in nova.conf and haproxy.cfg.

In the MySQL logs there are messages like:

[Warning] Aborted connection 17798924 to db: 'nova' user: 'nova' host: 
'xxx.xxx.xxx.xxx' (Got an error reading communication packets)

In the Haproxy logs there are messages like:

<134>2023-08-05T08:44:34.325338+00:00 dbs01 haproxy[1275]: 
xxx.xxx.xxx.xxx:44880 [05/Aug/2023:08:42:44.441] mysql_cluster_nova_conn 
mysql_cluster_nova_conn/dbs02 1/0/109884 11438 CD 4125/1884/1884/1884/0 0/0

"CD" means "The client unexpectedly aborted during data transfer"

So the Haproxy believes the connection is closed at client side. I 
checked the network interface counters - they are also fine (no errors, 
drops etc).


I found some discussions in the Internet. People say, when the DB is 
behind load balancer - such 'Lost connection' events are pretty normal 
and not harmful to the Openstack components.

I would agree with them. I think, Nova should be able to recover from 
accidental DB connection lost. Or to set the ERROR status to the VM if 
that's impossible.

My Nova version is 17.0.13 (Queens).

May be someone encountered such issue and knows how to fix?

Or that's a bug in Nova and I need to upgrade it to a more recent version?

Thanks in advance


----

Best regards,

Pavlo Bychikhin





More information about the openstack-discuss mailing list