Hey all, Thanks for your replies. About the fact that nova already implement this, I will try again on my side, but maybe it was not yet implemented in newton (I only tried nova on newton version). Thank you for bringing that to me. About the healhcheck already done on nova side (and also on neutron). As far as I understand, it's done using a specific rabbit queue, which can work while others queues are not working. The purpose of adding ping endpoint here is to be able to ping in all topics, not only those used for healthcheck reports. Also, as mentionned by Thierry, what we need is a way to externally do pings toward neutron agents and nova computes. The patch itself is not going to add any load on rabbit. It really depends on the way the operator will use it. On my side, I built a small external oslo.messaging script which I can use to do such pings. Cheers, -- Arnaud Morin On 03.08.20 - 12:15, Thierry Carrez wrote:
Ken Giusti wrote:
On Mon, Jul 27, 2020 at 1:18 PM Dan Smith <dms@danplanet.com <mailto:dms@danplanet.com>> wrote:
The primary concern was about something other than nova sitting on our bus making calls to our internal services. I imagine that the proposal to bake it into oslo.messaging is for the same purpose, and I'd probably have the same concern. At the time I think we agreed that if we were going to support direct-to-service health checks, they should be teensy HTTP servers with oslo healthchecks middleware. Further loading down rabbit with those pings doesn't seem like the best plan to me. Especially since Nova (compute) services already check in over RPC periodically and the success of that is discoverable en masse through the API.
While initially in favor of this feature Dan's concern has me reconsidering this.
Now I believe that if the purpose of this feature is to check the operational health of a service _using_ oslo.messaging, then I'm against it. A naked ping to a generic service point in an application doesn't prove the operating health of that application beyond its connection to rabbit.
While I understand the need to further avoid loading down Rabbit, I like the universality of this solution, solving a real operational issue.
Obviously that creates a trade-off (further loading rabbit to get more operational insights), but nobody forces you to run those ping calls, they would be opt-in. So the proposed code in itself does not weigh down Rabbit, or make anything sit on the bus.
Connectivity monitoring between an application and rabbit is done using the keepalive connection heartbeat mechanism built into the rabbit protocol, which O.M. supports today.
I'll let Arnaud answer, but I suspect the operational need is code-external checking of the rabbit->agent chain, not code-internal checking of the agent->rabbit chain. The heartbeat mechanism is used by the agent to keep the Rabbit connection alive, ensuring it works in most of the cases. The check described above is to catch the corner cases where it still doesn't.
-- Thierry Carrez (ttx)