RabbitMQ performance improvement for single-control node

27 Aug 2021

      Hi *,

I would greatly appreciate if anyone could help identifying some  
tweaks, if possible at all.

This is a single control node environment with 19 compute nodes which  
are heavily used. It's still running on Pike and we won't be able to  
update until complete redeployment.
I was able to improve the performance a lot by increasing the number  
of workers etc., but to me it seems the next bottleneck is rabbitmq.
The rabbit process shows heavy CPU usage, almost constantly around  
150% according to 'top' and I see heartbeat errors in the logs (e.g.  
cinder). Since there are so many config options I haven't touched them  
yet. Reading some of the tuning docs often refer to HA environments  
which is not the case here.

I can paste some of the config snippets, maybe that already helps  
identifying anything:

# nova.conf

[oslo_messaging_rabbit]
pool_max_size = 1600
pool_max_overflow = 1600
pool_timeout = 120
rpc_response_timeout = 120
rpc_conn_pool_size = 600

Eugen Block

Eugen Block

Mohammed Naser

Satish Patel

Eugen Block

tags

participants (3)