[olso.messaging][large-scale] RabbitMQ and streams

Arnaud Morin arnaud.morin at gmail.com
Wed Aug 16 08:47:50 UTC 2023


Hey all,

Following up this discussion, we implemented few things in
oslo.messaging:
- rely on streams instead of fanouts
- switch ALL queues to quorum (no more transients/classic queues)
- swtich to consistent queue naming (no more random uuid4 in queue
  naming)

For the last point, we did this for two main purposes:
- being able to easily identify a queue based on it's name (we added
  the compute hostname and processname in the queue name, as an
  operator, it allow us to quickly identify which server/service a queue
  belong)
- re-use the queues after a service restart (we identified that high
  queue churn is problematic for rabbitmq)

The results are pretty awesome! (pictures attached to the lp bug)
- the CPU load reduced a lot (divided by 5)
- the memory was also reduced (divided by 2)
- the number of queue was divided by 2 (neutron is heavily relying on
  fanouts for remote cache population / we are using ovs based agents)

However, the network traffic increased (multiply by 3), mostly due to
the switch from classic queues (with few classic HA) to quorum (all HA).

We pushed some of our patches to oslo.messaging repo here:
https://review.opendev.org/q/topic:bug-2031497
all related to this bug:
https://bugs.launchpad.net/oslo.messaging/+bug/2031497

Feel free to review/comment.


The bonus of switching all queues to quorum (HA) is that we can now
easily drain a rabbit node without affecting the openstack region (a.k.a
rabbit is not SPOF anymore).

Cheers,
Arnaud, on behalf of OVHcloud team.

On 23.07.23 - 13:03, Arnaud Morin wrote:
> Hey all,
> 
> Is there any chance that someone already worked on integrating RabbitMQ
> streams (see [1]) to replace fanouts in oslo rabbitmq driver?
> 
> We are wondering if this would help lowering the rabbit usage, and maybe
> increase the rabbit stability?
> 
> Our deployment is relying a lot on fanouts to delivers messages to
> computes (mostly neutron, e.g. secgroup update / remote cache population),
> this is leading to a massive number of messages beeing delivered by
> seconds.
> 
> Cheers,
> Arnaud
> 
> [1] https://www.rabbitmq.com/streams.html



More information about the openstack-discuss mailing list