[largescale-sig][nova][neutron][oslo] RPC ping
Ben Nemec
openstack at nemebean.com
Tue Jul 28 15:09:32 UTC 2020
On 7/28/20 9:25 AM, Bogdan Dobrelya wrote:
> On 7/28/20 4:11 PM, Ken Giusti wrote:
>>
>>
>> On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya <bdobreli at redhat.com
>> <mailto:bdobreli at redhat.com>> wrote:
>>
>> On 7/27/20 7:08 PM, Dan Smith wrote:
>> >> Tagging with Nova and Neutron as they are mentioned and I
>> thought some
>> >> people from those teams had opinions on this.
>> >
>> > Nova already implements ping() on the compute RPC interface,
>> which we
>> > use to make sure compute waits to start up until conductor is
>> available
>> > to do its bidding. So if a new obligatory RPC server method is
>> actually
>> > added called ping(), it will break us.
>> >
>> >> Can you refresh my memory on why we dropped this before? I recall
>> >> talking about it in Denver, but I can't for the life of me
>> remember
>> >> what the conclusion was. Did we intend to use something else for
>> this
>> >> that has since fallen through?
>> >
>> > The prior conversation I recall was about helm sitting on our
>> bus to
>> > (ab)use our ping method for health checks:
>> >
>> >
>>
>> https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa
>>
>> >
>> > I believe that has since been reverted.
>> >
>> > The primary concern was about something other than nova sitting
>> on our
>> > bus making calls to our internal services. I imagine that the
>> proposal
>> > to bake it into oslo.messaging is for the same purpose, and I'd
>> probably
>> > have the same concern. At the time I think we agreed that if we
>> were
>> > going to support direct-to-service health checks, they should be
>> teensy
>> > HTTP servers with oslo healthchecks middleware. Further loading
>> down
>> > rabbit with those pings doesn't seem like the best plan to
>> > me. Especially since Nova (compute) services already check in
>> over RPC
>> > periodically and the success of that is discoverable en masse
>> through
>> > the API.
>>
>> Having RPC ping in the common messaging library could improve
>> aliveness
>> handling of long-running APIs, like listing multiple Neutron ports or
>> Heat objects with full details, or running some longish Mistral
>> workflow
>> maybe. Indeed it should be made not breaking things already
>> existing in
>> Nova ofc.
>>
>>
>> Not sure this is related to your concern about long running API's but
>> O.M. has an optional RPC call heartbeat monitor that verifies the
>> connectivity to the server while the call is in progress. See the
>> description of call_monitor_timeout in the RPC client docs [0].
>
> Correct, but heartbeats didn't show off as a reliable solution. There
> were WSGI & eventlet related issues [1] with running heartbeats. I can't
> recall that was the final outcome of that discussion and what was the
> fix. So relying on explicit pings sent by clients could work better
> perhaps.
How so? The client is going to do the exact same thing as oslo.messaging
heartbeats - start a separate thread to send pings, then make the
long-running RPC call. It would hit the same eventlet/wsgi bug that
oslo.messaging does.
Also, there's a workaround for that bug in oslo.messaging:
https://github.com/openstack/oslo.messaging/commit/1541b0c7f965b9defb02b9e63975db2d29d99242
If you re-implemented heartbeating you would have to also re-implement
the workaround.
On a related note, I've added a topic to our next meeting to discuss
turning that workaround on by default since it's been there for a year
and no one has complained that it broke them.
>
> [1] https://bugs.launchpad.net/tripleo/+bug/1829062
>
>>
>> 0:
>> https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html
>>
>>
>>
>> >
>> > --Dan
>> >
>>
>>
>> -- Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>>
>>
>>
>>
>> --
>> Ken Giusti (kgiusti at gmail.com <mailto:kgiusti at gmail.com>)
>
>
More information about the openstack-discuss
mailing list