On 8/13/20 11:07 AM, Sean Mooney wrote:
>>   I think it's probably
>> better to provide a well-defined endpoint for them to talk to rather
>> than have everyone implement their own slightly different RPC ping
>> mechanism. The docs for this feature should be very explicit that this
>> is the only thing external code should be calling.
> ya i think that is a good approch.
> i would still prefer if people used say middelware to add a service ping admin api endpoint
> instead of driectly calling the rpc endpoint to avoid exposing rabbitmq but that is out of scope of this discussion.

Completely agree. In the long run I would like to see this replaced with 
better integrated healthchecking in OpenStack, but we've been talking 
about that for years and have made minimal progress.

>>> so if this does actully detect somethign we can otherwise detect and the use cases involves using it within
>>> the openstack services not form an external source then i think that is fine but we proably need to use another
>>> name (alive? status?) or otherewise modify nova so that there is no conflict.
>> If I understand your analysis of the bug correctly, this would have
>> caught that type of outage after all since the failure was asymmetric.
> am im not sure
> it might yes looking at https://review.opendev.org/#/c/735385/6
> its not clear to me how the endpoint is invoked. is it doing a topic send or a direct send?
> to detech the failure you would need to invoke a ping on the compute service and that ping would
> have to been encured on the to nova topic exchante with a routing key of compute.<compute node hostname>
> if the compute topic queue was broken either because it was nolonger bound to the correct topic or due to some other
> rabbitmq error then you woudl either get a message undeilverbale error of some kind with the mandaroy flag or likely a
> timeout without the mandaroty flag. so if the ping would be routed usign a topic too compute.<compute node hostname>
> then yes it would find this.
> although we can also detech this ourselves and fix it using the mandatory flag i think by just recreating the queue wehn
> it extis but we get an undeliverable message, at least i think we can rabbit is not my main are of expertiese so it
> woudl be nice is someone that know more about it can weigh in on that.

I pinged Ken this morning to take a look at that. He should be able to 
tell us whether it's a good idea or crazy talk. :-)

