Re: [nova][cinder][neutron] Heartbeat improvement: checking if main thread responds to rabbit ping

16 Dec 2025

      Hi Arnaud,

Thanks for your reply.

On 12/16/25 10:27 AM, Arnaud Morin wrote:
...
Hey,
This sounds like what we introduced years ago with rpc_ping_enabled (see
[1], and [2])
Have you tried it?
Note that, we used to have it for years in our production clusters, but
we finally disabled it for two reasons:
1- it was sending a lot of RMQ messages, because we were monitoring all
    our agents with this, not only the workers.
According to my calculation, it should be OK with our workload (maybe 
we'll get 10 messages per second).
...
2- it was not catching all use cases: the way we implemented it is that
    only one thread was waiting for ping requests. And most of the time,
    the ping thread was working correctly, even if some other threads
    (green threads...... ev..let) were stuck / dead.
Indeed. As we've experienced the heartbeat thread being alive, and the 
main thread being dead, this is exactly what I'm trying to avoid: I am 
trying to implement the ping reply in the *main* thread, not the thread 
doing heartbeat, or a thread that's dedicated to replying to ping.

It looks like what I wrote somehow worked: I could see the ping/pong in 
the cinder-volume logs of the OpenStack CI. Though also, it looks like I 
implemented it in the wrong class. I should have just modify the 
is_working() of VolumeManager in cinder/volume/manager.py, instead of 
cinder/manager.py and cinder/cmd/volume.py, I believe. Now, all is 
broken again, and I have to fix my patch again.

Let's see where this leads me...

Cheers,

Thomas Goirand (zigo)