[largescale-sig][nova][neutron][oslo] RPC ping
Thierry Carrez
thierry at openstack.org
Mon Aug 3 10:15:06 UTC 2020
Ken Giusti wrote:
> On Mon, Jul 27, 2020 at 1:18 PM Dan Smith <dms at danplanet.com
> <mailto:dms at danplanet.com>> wrote:
>> The primary concern was about something other than nova sitting on our
>> bus making calls to our internal services. I imagine that the proposal
>> to bake it into oslo.messaging is for the same purpose, and I'd probably
>> have the same concern. At the time I think we agreed that if we were
>> going to support direct-to-service health checks, they should be teensy
>> HTTP servers with oslo healthchecks middleware. Further loading down
>> rabbit with those pings doesn't seem like the best plan to
>> me. Especially since Nova (compute) services already check in over RPC
>> periodically and the success of that is discoverable en masse through
>> the API.
>
> While initially in favor of this feature Dan's concern has me
> reconsidering this.
>
> Now I believe that if the purpose of this feature is to check the
> operational health of a service _using_ oslo.messaging, then I'm against
> it. A naked ping to a generic service point in an application doesn't
> prove the operating health of that application beyond its connection to
> rabbit.
While I understand the need to further avoid loading down Rabbit, I like
the universality of this solution, solving a real operational issue.
Obviously that creates a trade-off (further loading rabbit to get more
operational insights), but nobody forces you to run those ping calls,
they would be opt-in. So the proposed code in itself does not weigh down
Rabbit, or make anything sit on the bus.
> Connectivity monitoring between an application and rabbit is
> done using the keepalive connection heartbeat mechanism built into the
> rabbit protocol, which O.M. supports today.
I'll let Arnaud answer, but I suspect the operational need is
code-external checking of the rabbit->agent chain, not code-internal
checking of the agent->rabbit chain. The heartbeat mechanism is used by
the agent to keep the Rabbit connection alive, ensuring it works in most
of the cases. The check described above is to catch the corner cases
where it still doesn't.
--
Thierry Carrez (ttx)
More information about the openstack-discuss
mailing list