On 7/28/20 9:25 AM, Bogdan Dobrelya wrote:
On 7/28/20 4:11 PM, Ken Giusti wrote:
On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya <bdobreli@redhat.com <mailto:bdobreli@redhat.com>> wrote:
On 7/27/20 7:08 PM, Dan Smith wrote: >> Tagging with Nova and Neutron as they are mentioned and I thought some >> people from those teams had opinions on this. > > Nova already implements ping() on the compute RPC interface, which we > use to make sure compute waits to start up until conductor is available > to do its bidding. So if a new obligatory RPC server method is actually > added called ping(), it will break us. > >> Can you refresh my memory on why we dropped this before? I recall >> talking about it in Denver, but I can't for the life of me remember >> what the conclusion was. Did we intend to use something else for this >> that has since fallen through? > > The prior conversation I recall was about helm sitting on our bus to > (ab)use our ping method for health checks: > >
https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a6...
> > I believe that has since been reverted. > > The primary concern was about something other than nova sitting on our > bus making calls to our internal services. I imagine that the proposal > to bake it into oslo.messaging is for the same purpose, and I'd probably > have the same concern. At the time I think we agreed that if we were > going to support direct-to-service health checks, they should be teensy > HTTP servers with oslo healthchecks middleware. Further loading down > rabbit with those pings doesn't seem like the best plan to > me. Especially since Nova (compute) services already check in over RPC > periodically and the success of that is discoverable en masse through > the API.
Having RPC ping in the common messaging library could improve aliveness handling of long-running APIs, like listing multiple Neutron ports or Heat objects with full details, or running some longish Mistral workflow maybe. Indeed it should be made not breaking things already existing in Nova ofc.
Not sure this is related to your concern about long running API's but O.M. has an optional RPC call heartbeat monitor that verifies the connectivity to the server while the call is in progress. See the description of call_monitor_timeout in the RPC client docs [0].
Correct, but heartbeats didn't show off as a reliable solution. There were WSGI & eventlet related issues [1] with running heartbeats. I can't recall that was the final outcome of that discussion and what was the fix. So relying on explicit pings sent by clients could work better perhaps.
How so? The client is going to do the exact same thing as oslo.messaging heartbeats - start a separate thread to send pings, then make the long-running RPC call. It would hit the same eventlet/wsgi bug that oslo.messaging does. Also, there's a workaround for that bug in oslo.messaging: https://github.com/openstack/oslo.messaging/commit/1541b0c7f965b9defb02b9e63... If you re-implemented heartbeating you would have to also re-implement the workaround. On a related note, I've added a topic to our next meeting to discuss turning that workaround on by default since it's been there for a year and no one has complained that it broke them.
[1] https://bugs.launchpad.net/tripleo/+bug/1829062
0: https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html
> > --Dan >
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Ken Giusti (kgiusti@gmail.com <mailto:kgiusti@gmail.com>)