[largescale-sig][nova][neutron][oslo] RPC ping
arnaud.morin at gmail.com
Thu Aug 6 14:04:21 UTC 2020
Thanks for your replies.
About the fact that nova already implement this, I will try again on my
side, but maybe it was not yet implemented in newton (I only tried nova
on newton version). Thank you for bringing that to me.
About the healhcheck already done on nova side (and also on neutron).
As far as I understand, it's done using a specific rabbit queue, which
can work while others queues are not working.
The purpose of adding ping endpoint here is to be able to ping in all
topics, not only those used for healthcheck reports.
Also, as mentionned by Thierry, what we need is a way to externally
do pings toward neutron agents and nova computes.
The patch itself is not going to add any load on rabbit. It really
depends on the way the operator will use it.
On my side, I built a small external oslo.messaging script which I can
use to do such pings.
On 03.08.20 - 12:15, Thierry Carrez wrote:
> Ken Giusti wrote:
> > On Mon, Jul 27, 2020 at 1:18 PM Dan Smith <dms at danplanet.com
> > <mailto:dms at danplanet.com>> wrote:
> > > The primary concern was about something other than nova sitting on our
> > > bus making calls to our internal services. I imagine that the proposal
> > > to bake it into oslo.messaging is for the same purpose, and I'd probably
> > > have the same concern. At the time I think we agreed that if we were
> > > going to support direct-to-service health checks, they should be teensy
> > > HTTP servers with oslo healthchecks middleware. Further loading down
> > > rabbit with those pings doesn't seem like the best plan to
> > > me. Especially since Nova (compute) services already check in over RPC
> > > periodically and the success of that is discoverable en masse through
> > > the API.
> > While initially in favor of this feature Dan's concern has me
> > reconsidering this.
> > Now I believe that if the purpose of this feature is to check the
> > operational health of a service _using_ oslo.messaging, then I'm against
> > it. A naked ping to a generic service point in an application doesn't
> > prove the operating health of that application beyond its connection to
> > rabbit.
> While I understand the need to further avoid loading down Rabbit, I like the
> universality of this solution, solving a real operational issue.
> Obviously that creates a trade-off (further loading rabbit to get more
> operational insights), but nobody forces you to run those ping calls, they
> would be opt-in. So the proposed code in itself does not weigh down Rabbit,
> or make anything sit on the bus.
> > Connectivity monitoring between an application and rabbit is done using
> > the keepalive connection heartbeat mechanism built into the rabbit
> > protocol, which O.M. supports today.
> I'll let Arnaud answer, but I suspect the operational need is code-external
> checking of the rabbit->agent chain, not code-internal checking of the
> agent->rabbit chain. The heartbeat mechanism is used by the agent to keep
> the Rabbit connection alive, ensuring it works in most of the cases. The
> check described above is to catch the corner cases where it still doesn't.
> Thierry Carrez (ttx)
More information about the openstack-discuss