[openstack-dev] [Openstack-operators] [Fuel][Oslo][RabbitMQ][Shovel] Deprecate mirrored queues from HA AMQP cluster scenario

Michael Klishin mklishin at pivotal.io
Mon Jun 8 12:34:27 UTC 2015

On 8 June 2015 at 15:10:15, Davanum Srinivas (davanum at gmail.com) wrote:
> I'd like to bring out a poll about deprecating the RabbitMQ mirrored  
> queues for HA layout and replacing the AMQP clustering by shovel  
> [0],
> [1]. I guess the federation would not be a good option, but let's  
> consider it as well.

RabbitMQ team member here. 

Neither Shovel nor Federation will replace mirroring. Shovel moves messages
from a queue to an exchange (within a single node or between remote nodes and/or clusters).
It doesn't replicate anything.

Federation has two parts to it:

 * Queue federation: no replicate, distributes messages from a single logical queue
   between N nodes or clusters, when there are no local consumers to consume them.
 * Exchange federation replicates a stream of messages going through an exchange.
   As messages are consumed upstream, downstream has no way of knowing about it.

> Why this must be done? The answer is that the rabbit cluster cannot  
> detect and survive "micro outages" well and just ending up with  
> some
> queues stuck and as a result, the rabbitmqctl control plane hanged  
> completely unresponsive (until the rabbit node erased and recovered  
> its
> cluster membership). These outages could be caused either by  
> the network
> *or* by CPU load spikes. For example, like this bug in Fuel project  
> [2]
> and this mail thread [3].

The right thing to do here is introduce timeouts to rabbitmqctl, which was 99% finished
in the past but some RabbitMQ team members felt it should produce more detailed
error messages, which extended the scope of the change significantly.

> This seems rather the Erlang's 
> Mnesia generic clustering issue, than something what could be just fixed 
> in RabbitMQ, unless the mnesia based clustering would be dropped 
> completely ;)

While Mnesia indeed needs to be replaced to introduce AP (as in CAP) style mirroring,
the issue you're bringing up here has nothing to do with Mnesia.
Mnesia is not used by rabbitmqctl, and it is not used to store messages.
It's a rabbitmqctl
issue, and potentially a hint that you may want to reduce net_ticktime value (say, to 5-10 seconds)
to make queue master unavailability detected faster.

1. http://www.rabbitmq.com/nettick.html

Staff Software Engineer, Pivotal/RabbitMQ  

More information about the OpenStack-dev mailing list