On 7/29/20 12:26 AM, Dan Smith wrote:
Correct, but heartbeats didn't show off as a reliable solution. There were WSGI & eventlet related issues [1] with running heartbeats. I can't recall that was the final outcome of that discussion and what was the fix. So relying on explicit pings sent by clients could work better perhaps.
There are two types of heartbeats in and around oslo.messaging, which is why call_monitor was used for the long-running RPC thing. The bug you're referencing is, I believe, talking about heartbeating the api->rabbit connection, and has nothing to do with service-to-service pinging, which this thread is about.
The call_monitor stuff Ken mentioned requires the *server* side to do the heartbeating, so something like nova-compute or nova-conductor. Those things aren't running under uwsgi and don't have any problems with threading to accomplish those goals.
So, if we're talking about generic ping() to provide a robust long-running RPC call, oslo.messaging already does this (if you ask for it). Otherwise, a generic service-to-service ping() doesn't, as was mentioned, really mean anything at all about the ability to do meaningful work (other than further saturate the message bus).
Thank you for that great information Dan, Ken. Then please disregard that mistakenly highlighted aspect. Didn't want to derail the thread by that apparently unrelated side case. I believe the original intention for RPC ping was to have something initated by clients, not server-side? That may be useful when running in Kuberenetes pod with aliveness/readiness probes set up. While the latter may be not the best fit for RPC ping indeed, the former seems like a much better way to check aliveness than just checking TCP connection to rabbit port?
--Dan
-- Best regards, Bogdan Dobrelya, Irc #bogdando