Do you actually need notifications to be enabled?  If not, switch the driver to noop and make sure you purge the notification queues

On Fri, Aug 27, 2021 at 4:56 AM Eugen Block <eblock@nde.ag> wrote:
Sorry, I accidentally hit "send"...adding more information.

# nova.conf

[oslo_messaging_notifications]
driver = messagingv2
[oslo_messaging_rabbit]
amqp_durable_queues = false
rabbit_ha_queues = false
ssl = false
heartbeat_timeout_threshold = 10

# neutron.conf

[oslo_messaging_rabbit]
pool_max_size = 5000
pool_max_overflow = 2000
pool_timeout = 60


These are the basics in the openstack services, I don't have access to 
the rabbit.conf right now but I can add more information as soon as I 
regain access to that machine.

If there's anything else I can provide to improve the user experience 
at least a bit please let me know. It would help a lot!

Regards,
Eugen


Zitat von Eugen Block <eblock@nde.ag>:

> Hi *,
>
> I would greatly appreciate if anyone could help identifying some 
> tweaks, if possible at all.
>
> This is a single control node environment with 19 compute nodes 
> which are heavily used. It's still running on Pike and we won't be 
> able to update until complete redeployment.
> I was able to improve the performance a lot by increasing the number 
> of workers etc., but to me it seems the next bottleneck is rabbitmq.
> The rabbit process shows heavy CPU usage, almost constantly around 
> 150% according to 'top' and I see heartbeat errors in the logs (e.g. 
> cinder). Since there are so many config options I haven't touched 
> them yet. Reading some of the tuning docs often refer to HA 
> environments which is not the case here.
>
> I can paste some of the config snippets, maybe that already helps 
> identifying anything:
>
> # nova.conf
>
> [oslo_messaging_rabbit]
> pool_max_size = 1600
> pool_max_overflow = 1600
> pool_timeout = 120
> rpc_response_timeout = 120
> rpc_conn_pool_size = 600




--
Mohammed Naser
VEXXHOST, Inc.