[openstack-dev] [oslo] [messaging] 'retry' option

Gordon Sim gsim at redhat.com
Mon Jun 30 09:41:37 UTC 2014

On 06/28/2014 10:49 PM, Mark McLoughlin wrote:
> On Fri, 2014-06-27 at 17:02 +0100, Gordon Sim wrote:
>> A question about the new 'retry' option. The doc says:
>>       By default, cast() and call() will block until the
>>       message is successfully sent.
>> What does 'successfully sent' mean here?
> Unclear, ambiguous, probably driver dependent etc.
> The 'blocking' we're talking about here is establishing a connection
> with the broker. If the connection has been lost, then cast() will block
> until the connection has been re-established and the message 'sent'.

Understood, but to my mind, that is really an implementation detail.

>>   Does it mean 'written to the wire' or 'accepted by the broker'?
>> For the impl_qpid.py driver, each send is synchronous, so it means
>> accepted by the broker[1].
>> What does the impl_rabbit.py driver do? Does it just mean 'written to
>> the wire', or is it using RabbitMQ confirmations to get notified when
>> the broker accepts it (standard 0-9-1 has no way of doing this).
> I don't know, but it would be nice if someone did take the time to
> figure it out and document it :)

Having googled around a bit, it appears that kombu v3.* has a 
'confirm_publish' transport option when using the 'pyamqp' transport. 
That isn't available in the 2.* versions, which appear to be what is 
used in oslo.messaging, and I can't find that option specified anywhere 
either in the oslo.messaging codebase.

Running a series of casts using the latest impl_rabbit.py driver and 
examining the data on the wire also shows no confirms being sent.

So for impl_rabbit, the send is not acknowledged, but the delivery to 
consumers is. For impl_qpid its the other way round; the send is 
acknowledged but the delivery to consumers is not (though a prefetch of 
1 is used limiting the loss to one message).

> Seriously, some docs around the subtle ways that the drivers differ from
> one another would be helpful ... particularly if it exposed incorrect
> assumptions API users are currently making.

I'm happy to try and contribute to that.

>> If the intention is to block until accepted by the broker that has
>> obvious performance implications. On the other hand if it means block
>> until written to the wire, what is the advantage of that? Was that a
>> deliberate feature or perhaps just an accident of implementation?
>> The use case for the new parameter, as described in the git commit,
>> seems to be motivated by wanting to avoid the blocking when sending
>> notifications. I can certainly understand that desire.
>> However, notifications and casts feel like inherently asynchronous
>> things to me, and perhaps having/needing the synchronous behaviour is
>> the real issue?
> It's not so much about sync vs async, but a failure mode. By default, if
> we lose our connection with the broker, we wait until we can
> re-establish it rather than throwing exceptions (requiring the API
> caller to have its own retry logic) or quietly dropping the message.

Even when you have no failure, your calling thread has to wait until the 
point the send is deemed successful before returning. So it is 
synchronous with respect to whatever that success criteria is.

In the case where success is deemed to be acceptance by the broker 
(which is the case for the impl_qpid.py driver at present, whether 
intentional or not), the call is fully synchronous.

If on the other hand success is merely writing the message to the wire, 
then any failure may well cause message loss regardless of the retry 
option. The reconnect and retry in this case is only of limited value. 
It can avoid certain losses, but not others.

> The use case for ceilometer is to allow its RPCPublisher to have a
> publishing policy - block until the samples have been sent, queue (in an
> in-memory, fixed-length queue) if we don't have a connection to the
> broker, or drop it if we don't have a connection to the broker.
>    https://review.openstack.org/77845
> I do understand the ambiguity around what message delivery guarantees
> are implicit in cast() isn't ideal, but that's not what adding this
> 'retry' parameter was about.

Sure, I understand that. The retry option is necessitated by an 
(existing) implicit behaviour. However in my view that behaviour is 
implementations specific and of limited value in terms of the semantic 
contract of the call.


More information about the OpenStack-dev mailing list