Yes, notifications are biggest enemy of rabbitMQ, specially if you don’t purge time to time. Also increase time of stats collection for rabbitMQ monitor if you are using it. 

Sent from my iPhone

On Aug 27, 2021, at 10:34 PM, Mohammed Naser <mnaser@vexxhost.com> wrote:


Do you actually need notifications to be enabled?  If not, switch the driver to noop and make sure you purge the notification queues

On Fri, Aug 27, 2021 at 4:56 AM Eugen Block <eblock@nde.ag> wrote:
Sorry, I accidentally hit "send"...adding more information.

# nova.conf

[oslo_messaging_notifications]
driver = messagingv2
[oslo_messaging_rabbit]
amqp_durable_queues = false
rabbit_ha_queues = false
ssl = false
heartbeat_timeout_threshold = 10

# neutron.conf

[oslo_messaging_rabbit]
pool_max_size = 5000
pool_max_overflow = 2000
pool_timeout = 60


These are the basics in the openstack services, I don't have access to 
the rabbit.conf right now but I can add more information as soon as I 
regain access to that machine.

If there's anything else I can provide to improve the user experience 
at least a bit please let me know. It would help a lot!

Regards,
Eugen


Zitat von Eugen Block <eblock@nde.ag>:

> Hi *,
>
> I would greatly appreciate if anyone could help identifying some 
> tweaks, if possible at all.
>
> This is a single control node environment with 19 compute nodes 
> which are heavily used. It's still running on Pike and we won't be 
> able to update until complete redeployment.
> I was able to improve the performance a lot by increasing the number 
> of workers etc., but to me it seems the next bottleneck is rabbitmq.
> The rabbit process shows heavy CPU usage, almost constantly around 
> 150% according to 'top' and I see heartbeat errors in the logs (e.g. 
> cinder). Since there are so many config options I haven't touched 
> them yet. Reading some of the tuning docs often refer to HA 
> environments which is not the case here.
>
> I can paste some of the config snippets, maybe that already helps 
> identifying anything:
>
> # nova.conf
>
> [oslo_messaging_rabbit]
> pool_max_size = 1600
> pool_max_overflow = 1600
> pool_timeout = 120
> rpc_response_timeout = 120
> rpc_conn_pool_size = 600




--
Mohammed Naser
VEXXHOST, Inc.