[nova] VM gets stuck in BUILD status after SQL DBConnectionError error in nova-conductor
Hi all, Sometimes when I create a bunch of VMs (40) at once, one of them gets stuck in BUILD status. From logs I found that this happens due to SQL error in nova-conductor: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query'). There is an attempt to schedule the VM: .... nova-scheduler"sending reply msg_id: 593a990393d5439580870d8412ab49f3 reply queue: reply_a0992fe3f9924cb1b45bfa8b27057580 time elapsed: 1.49198414385s" And right after that the SQL error happens: nova-conductor "Exception during message handling" "DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT instance_id_mappings.created_at AS instance_id_mappings_created ......" And after that the VM stays in BUILD status forever. My setup is Galera cluster behind Haproxy load balancer. I checked various DB server timeouts - they are totally fine. Ones in MySQL correspond ones in nova.conf and haproxy.cfg. In the MySQL logs there are messages like: [Warning] Aborted connection 17798924 to db: 'nova' user: 'nova' host: 'xxx.xxx.xxx.xxx' (Got an error reading communication packets) In the Haproxy logs there are messages like: <134>2023-08-05T08:44:34.325338+00:00 dbs01 haproxy[1275]: xxx.xxx.xxx.xxx:44880 [05/Aug/2023:08:42:44.441] mysql_cluster_nova_conn mysql_cluster_nova_conn/dbs02 1/0/109884 11438 CD 4125/1884/1884/1884/0 0/0 "CD" means "The client unexpectedly aborted during data transfer" So the Haproxy believes the connection is closed at client side. I checked the network interface counters - they are also fine (no errors, drops etc). I found some discussions in the Internet. People say, when the DB is behind load balancer - such 'Lost connection' events are pretty normal and not harmful to the Openstack components. I would agree with them. I think, Nova should be able to recover from accidental DB connection lost. Or to set the ERROR status to the VM if that's impossible. My Nova version is 17.0.13 (Queens). May be someone encountered such issue and knows how to fix? Or that's a bug in Nova and I need to upgrade it to a more recent version? Thanks in advance ---- Best regards, Pavlo Bychikhin
participants (1)
-
Pavlo Bychikhin