Re: [largescale-sig][nova][neutron][oslo] RPC ping

28 Jul 2020

      On 7/28/20 9:25 AM, Bogdan Dobrelya wrote:
...
On 7/28/20 4:11 PM, Ken Giusti wrote:
...
On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya <bdobreli@redhat.com 
<mailto:bdobreli@redhat.com>> wrote:
    On 7/27/20 7:08 PM, Dan Smith wrote:
     >> Tagging with Nova and Neutron as they are mentioned and I
    thought some
     >> people from those teams had opinions on this.
     >
     > Nova already implements ping() on the compute RPC interface, 
which we
     > use to make sure compute waits to start up until conductor is
    available
     > to do its bidding. So if a new obligatory RPC server method is
    actually
     > added called ping(), it will break us.
     >
     >> Can you refresh my memory on why we dropped this before? I recall
     >> talking about it in Denver, but I can't for the life of me 
remember
     >> what the conclusion was. Did we intend to use something else for
    this
     >> that has since fallen through?
     >
     > The prior conversation I recall was about helm sitting on our 
bus to
     > (ab)use our ping method for health checks:
     >
     >
https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a6...
     >
     > I believe that has since been reverted.
     >
     > The primary concern was about something other than nova sitting
    on our
     > bus making calls to our internal services. I imagine that the
    proposal
     > to bake it into oslo.messaging is for the same purpose, and I'd
    probably
     > have the same concern. At the time I think we agreed that if we 
were
     > going to support direct-to-service health checks, they should be
    teensy
     > HTTP servers with oslo healthchecks middleware. Further loading 
down
     > rabbit with those pings doesn't seem like the best plan to
     > me. Especially since Nova (compute) services already check in
    over RPC
     > periodically and the success of that is discoverable en masse 
through
     > the API.
    Having RPC ping in the common messaging library could improve 
aliveness
    handling of long-running APIs, like listing multiple Neutron ports or
    Heat objects with full details, or running some longish Mistral
    workflow
    maybe. Indeed it should be made not breaking things already 
existing in
    Nova ofc.
Not sure this is related to your concern about long running API's but 
O.M. has an optional RPC call heartbeat monitor that verifies the 
connectivity to the server while the call is in progress.  See the 
description of call_monitor_timeout in the RPC client docs [0].
Correct, but heartbeats didn't show off as a reliable solution. There 
were WSGI & eventlet related issues [1] with running heartbeats. I can't 
recall that was the final outcome of that discussion and what was the 
fix. So relying on explicit pings sent by clients could work better 
perhaps.
How so? The client is going to do the exact same thing as oslo.messaging 
heartbeats - start a separate thread to send pings, then make the 
long-running RPC call. It would hit the same eventlet/wsgi bug that 
oslo.messaging does.

Also, there's a workaround for that bug in oslo.messaging: 
https://github.com/openstack/oslo.messaging/commit/1541b0c7f965b9defb02b9e63... 
If you re-implemented heartbeating you would have to also re-implement 
the workaround.

On a related note, I've added a topic to our next meeting to discuss 
turning that workaround on by default since it's been there for a year 
and no one has complained that it broke them.
...
[1] https://bugs.launchpad.net/tripleo/+bug/1829062
...
0: 
https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html
     >
     > --Dan
     >
    --     Best regards,
    Bogdan Dobrelya,
    Irc #bogdando
-- 
Ken Giusti  (kgiusti@gmail.com <mailto:kgiusti@gmail.com>)

Re: [largescale-sig][nova][neutron][oslo] RPC ping

Ben Nemec