Re: RabbitMQ annoying disconnections

20 Sep 2021

      Dominic,
according to documentation:

heartbeat_rate = 2

integer value

How often times during the heartbeat_timeout_threshold we check the heartbeat.

heartbeat_timeout_threshold = 60

integer value

Number of seconds after which the Rabbit broker is considered down if heartbeat’s keep-alive fails (0 disables heartbeat).

So to avoid disconnection from service side (nova, keystone, etc.) I’ve increased Heartbeat_timeout_treshold from 60 to 720 and set heartbeat_rate to 4, so Rabbit will be considered dead after 360s not after 60s as in default. In addition I’ve increased heartbeat to 600 from 60 in rabbitmq.conf. And again - according to documentation:

The heartbeat timeout value defines after what period of time the peer TCP connection should be considered unreachable (down) by RabbitMQ and client libraries.

So disconnection from Rabbit’s side should be after 600s of no heartbeat

And I still got disconnections.

Best regards
Adam Tomas
...
Wiadomość napisana przez DHilsbos@performair.com w dniu 17.09.2021, o godz. 18:38:
Adam;
If I'm reading this correctly; Rabbit is timing out, but you're increasing the heartbeat period of OpenStack.  This would make the issue worse, wouldn't it?
It seems to me that you would want to lower the heartbeat interval of OpenStack, and raise the timeout of Rabbit.
That said; it looks like you're using Kola, and I know nothing about Kola.
Thank you,
Dominic L. Hilsbos, MBA
Vice President – Information Technology
Perform Air International Inc.
DHilsbos@PerformAir.com
www.PerformAir.com
From: Adam Tomas [mailto:bkslash@poczta.onet.pl] 
Sent: Friday, September 17, 2021 5:01 AM
To: openstack-discuss
Subject: RabbitMQ annoying disconnections
Hi,
after some struggling I have almost „clear” logs (clear=error free :) ) Almost…. RabbitMQ keeps disconnecting sessions and there is a huge amount of disconnect errors in all logs (see below). I found this bug description:
https://bugzilla.redhat.com/show_bug.cgi?id=1711794
in which we can read as follows: "this is a recoverable issue that is already handled by how oslo.messaging is designed. disconnection is not an error and should not be reported as such in the logs.”
but… It is reported :( And produces tons of logs.
I tried to modify heartbeat values - helped a little bit, but I had to increase [database] max_pool_size = 5 and that of course multiplied number of disconnection errors by 5 :(
[oslo_messaging_rabbit]
heartbeat_timeout_threshold = 720 heartbeat_interval = 360 heartbeat_rate = 4