This seems the bug tracking this issue. https://bugs.launchpad.net/nova/+bug/1854992 In my case, there is some bunch VM creation initiated by Terraform. I assume it's all about rabbitmq. Any clear way to fix it once it happens, delete and recreate bindings, delete and recreate queues, or just restarting nova-compute?
Thanks! Tony
-----Original Message----- From: Tony Liu tonyliu0592@hotmail.com Sent: Sunday, November 15, 2020 2:55 PM To: Arnaud Morin arnaud.morin@gmail.com Cc: OpenStack Discuss openstack-discuss@lists.openstack.org Subject: [nova] MessageUndeliverable from nova-conductor and MessagingTimeout from nova-compute
It's not about connection, but message transmitting. Title is changed. I am having rabbitmq 3.8.4 and nova 21.0.0.
When I check rabbitmq queues for compute, I see 3 messages in compute.compute-1 and 0 messages in other compute node queues. In nova-conductor log of all 3 instances, I see "oslo_messaging.exceptions.MessageUndeliverable". In nova-compute log on compute-1, I see "Timed out waiting for nova-conductor." Actually, 4 out of 5 compute nodes have such warning. One compute node has exception "oslo_messaging.exceptions.MessagingTimeout".
"rabbitmqctl list_bindings | grep compute-1" shows this.
exchange compute.compute-1 queue compute.compute-
1 [] nova exchange compute.compute-1 queue compute.compute- 1 [] ================================= Is this some known issue? How did it happen? What's the cause of it and any way to prevent it from happening?
Thanks! Tony
-----Original Message----- From: Arnaud Morin arnaud.morin@gmail.com Sent: Saturday, November 14, 2020 2:10 AM To: Tony Liu tonyliu0592@hotmail.com Cc: OpenStack Discuss openstack-discuss@lists.openstack.org Subject: Re: [nova-compute] not reconnect to rabbitmq?
Hello,
What we noticed in our case is that nova compute is actually reconnecting, but cannot communicate with the conductor because the queue binding is either absent or not working anymore.
So, first, which version of nova are you running? Which version of rabbitmq? (some bugs related to shadow bindings are fixed after 3.7.x / dont remember x)
Can you check if you have any queue related to your compute? something like that: rabbtitmqctl list_queues | grep mycompute
Also check the bindings, better using the management interface or rabbitmqadmin: rabbitmqadmin list bindings | grep mycompute
What usually fixed our issue by the past was to delete / recreate the binding (easy to do from the management interface).
Cheers,
-- Arnaud Morin
On 14.11.20 - 00:34, Tony Liu wrote:
Hi,
I'm having a deployment with Ussuri on CentOS 8. I noticed that, in case the connection from nova-compute to rabbitmq is broken, nova-compute doesn't reconnect. I checked nova-conductor who seems keeping trying reconnect to rabbitmq when connection is broken. But nova-compute doesn't seem doing the same. I've seen it a few times, after I fixed rabbitmq and bring it back, nova-conductor gets reconnected, but nova-compute doesn't, I have to manually restart it. Anyone else has the similar experiences? Anything I am missing?
Thanks! Tony