Open Stack

Tue Jul 7 19:16:19 UTC 2015

On 07/07/2015 05:48 PM, Clint Byrum wrote:
> all of the call sites I checked _do not appear to resend_, they
> simply explode on timeout waiting for reply. This is how calling code
> should work and I'm ok with code in nova, cinder, et. al. being
> written this way, because I'd expect my messaging layer to be at
> least somewhat reliable

In my opinion, the calling code has better context for determining 
whether or not to retry. Tackling reliability issues end-to-end is often 
much more efficient also.

[...]
> I think you'll find that once you try to make oslo.messaging handle the
> retrying, that with the broker simply being ack'd all the time, you risk
> duplicating RPC calls if you retry in a loop.

Resending the request will always risk duplicating the call (unless the 
caller can verify that the previous request was not executed in some 
call specific way). Whether or not you acknowledge the request (and 
whether you do it before or after the processing of the request), the 
response can still get lost (neither requests nor responses are 
currently confirmed by the broker).

There is a message id 'cache' used to try and detect (and then ignore) 
duplicates. It's not clear to me how effective that is in practice as it 
only tracks the last 16 ids for a given listener. In any case if the 
listener process is restarted, or if the call is redelivered to a 
different server in a group, then the id cache would be of no use.

> The pattern is well
> established in RabbitMQ that acks should happen _AFTER_ the message has
> been consumed and thus should not be duplicated, not before.

That is the pattern for at-least-once delivery, where either processing 
is able to detect that a resent message was already processed or where 
reprocessing it is preferable to not processing it at all.

I *believe* olso.messaging (or impl_rabbit at least) was aiming for an 
at-most-once guarantee (i.e. avoiding duplication at the expense of 
dropped messages). That may be why the acknowledgement is done before 
processing, though since the acknowledgement is asynchronous, that only 
narrows the window it doesn't eliminate it.

I may of course be wrong. It would be great to have some one more 
qualified to comment on the intentions of the design provide some clarity.

Open Stack

[openstack-dev] [oslo.messaging] [mistral] Acknowledge feature of RabbitMQ in oslo.messaging

OpenStack

Community

Documentation

Branding & Legal