[largescale-sig][nova][neutron][oslo] RPC ping

Thierry Carrez thierry at openstack.org
Mon Aug 3 10:15:06 UTC 2020


Ken Giusti wrote:
> On Mon, Jul 27, 2020 at 1:18 PM Dan Smith <dms at danplanet.com 
> <mailto:dms at danplanet.com>> wrote:
>>     The primary concern was about something other than nova sitting on our
>>     bus making calls to our internal services. I imagine that the proposal
>>     to bake it into oslo.messaging is for the same purpose, and I'd probably
>>     have the same concern. At the time I think we agreed that if we were
>>     going to support direct-to-service health checks, they should be teensy
>>     HTTP servers with oslo healthchecks middleware. Further loading down
>>     rabbit with those pings doesn't seem like the best plan to
>>     me. Especially since Nova (compute) services already check in over RPC
>>     periodically and the success of that is discoverable en masse through
>>     the API.
> 
> While initially in favor of this feature Dan's concern has me 
> reconsidering this.
> 
> Now I believe that if the purpose of this feature is to check the 
> operational health of a service _using_ oslo.messaging, then I'm against 
> it.   A naked ping to a generic service point in an application doesn't 
> prove the operating health of that application beyond its connection to 
> rabbit. 

While I understand the need to further avoid loading down Rabbit, I like 
the universality of this solution, solving a real operational issue.

Obviously that creates a trade-off (further loading rabbit to get more 
operational insights), but nobody forces you to run those ping calls, 
they would be opt-in. So the proposed code in itself does not weigh down 
Rabbit, or make anything sit on the bus.

> Connectivity monitoring between an application and rabbit is 
> done using the keepalive connection heartbeat mechanism built into the 
> rabbit protocol, which O.M. supports today.

I'll let Arnaud answer, but I suspect the operational need is 
code-external checking of the rabbit->agent chain, not code-internal 
checking of the agent->rabbit chain. The heartbeat mechanism is used by 
the agent to keep the Rabbit connection alive, ensuring it works in most 
of the cases. The check described above is to catch the corner cases 
where it still doesn't.

-- 
Thierry Carrez (ttx)



More information about the openstack-discuss mailing list