[Openstack] Messaging reliability/durability expectations
Gordon Sim
gsim at redhat.com
Tue Oct 14 17:33:46 UTC 2014
I agree that greater clarity on expectations around reliability are needed.
The drivers all differ in this regard.
As it stands today, the impl_rabbit driver only retries an RPC request
if an exception occurs while sending it. However messages are sent
unconfirmed[1]. This means a message can be lost before it gets enqueued
by the broker, without the sender of the message receiving any error or
notification of that fact.
Even if the requests are durably stored and/or replicated in a clustered
RabbitMQ configuration, the reply queues are currently always
auto-deleted and are not durable regardless of configuration, so replies
may be lost on broker failure even if requests are not.
So I believe that various failures may cause an RPC request to fail
(i.e. to timeout). It seems this is not universally expected however, so
I am not sure how many OpenStack services using oslo.messaging expect
and handle such failures.
--Gordon
[1] The impl_qpid driver by contrast sends messages synchronously - i.e.
blocking until confirmed, but on the receive side it does not use
acknowledgements so again message loss is possible.
More information about the Openstack
mailing list