Open Stack

Tue Oct 14 17:33:46 UTC 2014

I agree that greater clarity on expectations around reliability are needed.

The drivers all differ in this regard.

As it stands today, the impl_rabbit driver only retries an RPC request 
if an exception occurs while sending it. However messages are sent 
unconfirmed[1]. This means a message can be lost before it gets enqueued 
by the broker, without the sender of the message receiving any error or 
notification of that fact.

Even if the requests are durably stored and/or replicated in a clustered 
RabbitMQ configuration, the reply queues are currently always 
auto-deleted and are not durable regardless of configuration, so replies 
may be lost on broker failure even if requests are not.

So I believe that various failures may cause an RPC request to fail 
(i.e. to timeout). It seems this is not universally expected however, so 
I am not sure how many OpenStack services using oslo.messaging expect 
and handle such failures.

--Gordon

[1] The impl_qpid driver by contrast sends messages synchronously - i.e. 
blocking until confirmed, but on the receive side it does not use 
acknowledgements so again message loss is possible.

Open Stack

[Openstack] Messaging reliability/durability expectations

OpenStack

Community

Documentation

Branding & Legal