[openstack-dev] 'retry' option

Mark McLoughlin markmc at redhat.com
Sat Jun 28 21:49:14 UTC 2014


On Fri, 2014-06-27 at 17:02 +0100, Gordon Sim wrote:
> A question about the new 'retry' option. The doc says:
> 
>      By default, cast() and call() will block until the
>      message is successfully sent.
> 
> What does 'successfully sent' mean here?

Unclear, ambiguous, probably driver dependent etc.

The 'blocking' we're talking about here is establishing a connection
with the broker. If the connection has been lost, then cast() will block
until the connection has been re-established and the message 'sent'.

>  Does it mean 'written to the wire' or 'accepted by the broker'?
> 
> For the impl_qpid.py driver, each send is synchronous, so it means 
> accepted by the broker[1].
> 
> What does the impl_rabbit.py driver do? Does it just mean 'written to 
> the wire', or is it using RabbitMQ confirmations to get notified when 
> the broker accepts it (standard 0-9-1 has no way of doing this).

I don't know, but it would be nice if someone did take the time to
figure it out and document it :)

Seriously, some docs around the subtle ways that the drivers differ from
one another would be helpful ... particularly if it exposed incorrect
assumptions API users are currently making.

> If the intention is to block until accepted by the broker that has 
> obvious performance implications. On the other hand if it means block 
> until written to the wire, what is the advantage of that? Was that a 
> deliberate feature or perhaps just an accident of implementation?
> 
> The use case for the new parameter, as described in the git commit, 
> seems to be motivated by wanting to avoid the blocking when sending 
> notifications. I can certainly understand that desire.
> 
> However, notifications and casts feel like inherently asynchronous 
> things to me, and perhaps having/needing the synchronous behaviour is 
> the real issue?

It's not so much about sync vs async, but a failure mode. By default, if
we lose our connection with the broker, we wait until we can
re-establish it rather than throwing exceptions (requiring the API
caller to have its own retry logic) or quietly dropping the message.

The use case for ceilometer is to allow its RPCPublisher to have a
publishing policy - block until the samples have been sent, queue (in an
in-memory, fixed-length queue) if we don't have a connection to the
broker, or drop it if we don't have a connection to the broker.

  https://review.openstack.org/77845

I do understand the ambiguity around what message delivery guarantees
are implicit in cast() isn't ideal, but that's not what adding this
'retry' parameter was about.

>  Calls by contrast, are inherently synchronous, but at 
> present the retry controls only the sending of the request. If the 
> server fails, the call may timeout regardless of the value of 'retry'.
> 
> Just in passing, I'd suggest that renaming the new parameter 
> max_reconnects, would make it's current behaviour and values clearer. 
> The name 'retry' sounds like a yes/no type value, and retry=0 v. retry=1 
> is the reverse of what I would intuitively expect.

Sounds reasonable. Would you like to submit a patch? Quick turnaround is
important, because if Ceilometer starts using this retry parameter
before we rename it, I'm not sure it'll be worth the hassle.

Thanks,
Mark.




More information about the OpenStack-dev mailing list