Re: [largescale-sig][nova][neutron][oslo] RPC ping

29 Jul 2020


      On 7/29/20 12:26 AM, Dan Smith wrote:
...
...
Correct, but heartbeats didn't show off as a reliable solution. There
were WSGI & eventlet related issues [1] with running heartbeats. I
can't recall that was the final outcome of that discussion and what
was the fix. So relying on explicit pings sent by clients could work
better perhaps.
[1] https://bugs.launchpad.net/tripleo/+bug/1829062
There are two types of heartbeats in and around oslo.messaging, which is
why call_monitor was used for the long-running RPC thing. The bug you're
referencing is, I believe, talking about heartbeating the api->rabbit
connection, and has nothing to do with service-to-service pinging, which
this thread is about.
The call_monitor stuff Ken mentioned requires the *server* side to do
the heartbeating, so something like nova-compute or
nova-conductor. Those things aren't running under uwsgi and don't have
any problems with threading to accomplish those goals.
So, if we're talking about generic ping() to provide a robust
long-running RPC call, oslo.messaging already does this (if you ask for
it). Otherwise, a generic service-to-service ping() doesn't, as was
mentioned, really mean anything at all about the ability to do
meaningful work (other than further saturate the message bus).
Thank you for that great information Dan, Ken.
Then please disregard that mistakenly highlighted aspect. Didn't want to 
derail the thread by that apparently unrelated side case. I believe the 
original intention for RPC ping was to have something initated by 
clients, not server-side? That may be useful when running in Kuberenetes 
pod with aliveness/readiness probes set up. While the latter may be not 
the best fit for RPC ping indeed, the former seems like a much better 
way to check aliveness than just checking TCP connection to rabbit port?
...
--Dan
-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando