This sounds like you're hitting the heartbeat/eventlet bug. Basically eventlet won't schedule the heartbeat thread unless something is regularly waking up the api threads. Depending on what release you're running, you may be able to set heartbeat_in_pthread[0] to fix this.
0:
https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/impl_rabbit.py#L90On 9/17/21 7:00 AM, Adam Tomas wrote:
Hi,
after some struggling I have almost „clear” logs (clear=error free :) ) Almost…. RabbitMQ keeps disconnecting sessions and there is a huge amount of disconnect errors in all logs (see below). I found this bug description:
https://bugzilla.redhat.com/show_bug.cgi?id=1711794 <https://bugzilla.redhat.com/show_bug.cgi?id=1711794>
in which we can read as follows: "this is a recoverable issue that is already handled by how oslo.messaging is designed. disconnection is not an error and should not be reported as such in the logs.”
but… It is reported :( And produces tons of logs.
I tried to modify heartbeat values - helped a little bit, but I had to increase [database] max_pool_size = 5 and that of course multiplied number of disconnection errors by 5 :(
[oslo_messaging_rabbit]
heartbeat_timeout_threshold = 720 heartbeat_interval = 360 heartbeat_rate = 4
How to avoid this disconnections?? Or what to do to avoid writing this „errors”to logs?
Best regards
Adam Tomas
/var/log/kolla/rabbitmq/rabbitXXX.log
2021-09-17 06:43:08.771 [error] <0.16056.678> closing AMQP connection <0.16056.678> (x.x.x.x00:44882 -> x.x.x.x00:5672 - mod_wsgi:959:aaa12af5-bb0b-456e-a129-a466eccc24f0):
missed heartbeats from client, timeout: 60s
2021-09-17 06:43:08.773 [info] <0.7412.679> Closing all channels from connection 'x.x.x.x00:44882 -> x.x.x.x00:5672' because it has been closed
2021-09-17 06:43:09.282 [info] <0.4574.679> accepting AMQP connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672)
2021-09-17 06:43:09.283 [info] <0.28410.678> accepting AMQP connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672)
2021-09-17 06:43:09.285 [info] <0.28410.678> Connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:954:c05a1221-6915-4414-ba93-7db5450ff394
2021-09-17 06:43:09.285 [info] <0.4574.679> Connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:967:08f29ce5-d6ee-43ae-87a9-e8c8856f8c93
2021-09-17 06:43:09.286 [info] <0.4574.679> connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672 - magnum-conductor:967:08f29ce5-d6ee-43ae-87a9-e8c8856f8c93): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.286 [info] <0.28410.678> connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672 - magnum-conductor:954:c05a1221-6915-4414-ba93-7db5450ff394): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.504 [error] <0.14465.678> closing AMQP connection <0.14465.678> (x.x.x.x00:44914 -> x.x.x.x00:5672 - mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3):
missed heartbeats from client, timeout: 60s
2021-09-17 06:43:09.507 [info] <0.14985.679> Closing all channels from connection 'x.x.x.x00:44914 -> x.x.x.x00:5672' because it has been closed
2021-09-17 06:43:09.521 [info] <0.11574.678> accepting AMQP connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672)
2021-09-17 06:43:09.523 [info] <0.11574.678> Connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672) has a client-provided name: mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3
2021-09-17 06:43:09.525 [info] <0.11574.678> connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672 - mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.654 [info] <0.31575.679> accepting AMQP connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672)
2021-09-17 06:43:09.656 [info] <0.31575.679> Connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:7:5a1cc3cc-150d-417a-9652-1279597c3db7
2021-09-17 06:43:09.656 [info] <0.31180.678> accepting AMQP connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672)
2021-09-17 06:43:09.658 [info] <0.31575.679> connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672 - magnum-conductor:7:5a1cc3cc-150d-417a-9652-1279597c3db7): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.658 [info] <0.31180.678> Connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:954:c8deffa4-7d67-46be-9b9d-09fa20544a2b
2021-09-17 06:43:09.660 [info] <0.2757.679> accepting AMQP connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672)
2021-09-17 06:43:09.660 [info] <0.31180.678> connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672 - magnum-conductor:954:c8deffa4-7d67-46be-9b9d-09fa20544a2b): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.662 [info] <0.2757.679> Connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:7:cebf81c9-95f6-48fa-8bde-30c01243d315
2021-09-17 06:43:09.663 [info] <0.2757.679> connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672 - magnum-conductor:7:cebf81c9-95f6-48fa-8bde-30c01243d315): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:09.814 [error] <0.22797.678> closing AMQP connection <0.22797.678> (x.x.x.x05:57676 -> x.x.x.x00:5672 - neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e):
missed heartbeats from client, timeout: 60s
2021-09-17 06:43:09.817 [info] <0.7903.679> Closing all channels from connection 'x.x.x.x05:57676 -> x.x.x.x00:5672' because it has been closed
2021-09-17 06:43:10.319 [error] <0.18268.678> closing AMQP connection <0.18268.678> (x.x.x.x00:44924 -> x.x.x.x00:5672 - nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09):
missed heartbeats from client, timeout: 60s
2021-09-17 06:43:10.322 [info] <0.28032.678> Closing all channels from connection 'x.x.x.x00:44924 -> x.x.x.x00:5672' because it has been closed
2021-09-17 06:43:10.330 [info] <0.19596.678> accepting AMQP connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672)
2021-09-17 06:43:10.333 [info] <0.19596.678> Connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672) has a client-provided name: nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09
2021-09-17 06:43:10.334 [info] <0.19596.678> connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672 - nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:10.831 [info] <0.9838.678> accepting AMQP connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672)
2021-09-17 06:43:10.834 [info] <0.9838.678> Connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672) has a client-provided name: neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e
2021-09-17 06:43:10.835 [info] <0.9838.678> connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672 - neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e): user 'openstack' authenticated and granted access to vhost '/'
2021-09-17 06:43:11.738 [error] <0.15275.678> closing AMQP connection <0.15275.678> (x.x.x.x01:53414 -> x.x.x.x00:5672 - mod_wsgi:968:b60f723c-882e-4780-a162-55c03ebe6179):
missed heartbeats from client, timeout: 60s
/var/log/kolla/glance/glance-api.log
2021-09-17 13:17:23.212 958 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:17:29.208 955 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:17:42.190 957 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:17:42] "GET / HTTP/1.1" 300 1446 0.001949
2021-09-17 13:17:46.351 957 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:18:12.336 955 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:18:12] "GET / HTTP/1.1" 300 1446 0.003754
2021-09-17 13:18:13.703 956 INFO eventlet.wsgi.server [req-1a00db34-349b-4ed3-8625-edfadc939be4 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:13] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.139229
2021-09-17 13:18:33.065 955 INFO eventlet.wsgi.server [req-a57046eb-549e-4837-95c7-d0994c05449e d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:33] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.096302
2021-09-17 13:18:33.077 955 INFO eventlet.wsgi.server [req-3a8af281-6732-4afd-b3ec-e502367bce51 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:33] "GET /v2/schemas/image HTTP/1.1" 200 6259 0.003982
2021-09-17 13:18:36.747 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:18:36.759 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:18:42.460 957 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:18:42] "GET / HTTP/1.1" 300 1446 0.001383
2021-09-17 13:18:52.659 956 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:18:52.670 956 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:19:12.598 955 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:19:12] "GET / HTTP/1.1" 300 1446 0.001590
2021-09-17 13:19:16.367 957 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:19:27.127 955 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:19:27.184 955 INFO eventlet.wsgi.server [req-8f68efc9-f46a-450d-8138-928e4f2585b4 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:19:27] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.032917
/var/log/kolla/keystone/keystone.log
2021-09-17 13:13:54.670 958 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:13:55.148 960 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:13:55.150 962 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:13:55.345 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:13:56.044 961 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:18.402 966 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:31.372 963 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:36.580 967 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:38.771 965 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:16:12.226 964 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
/var/log/kolla/nova/nova-compute.log
2021-09-17 13:10:07.710 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56226.
2021-09-17 13:11:11.473 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:11:34.046 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:11:34.721 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:13:07.699 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:13:08.734 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56434.
2021-09-17 13:14:11.501 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:34.070 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:14:34.745 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:16:08.724 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:16:09.756 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56638.
2021-09-17 13:17:34.098 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:19:09.747 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:19:10.781 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56834.
2021-09-17 13:20:34.124 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:20:34.777 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:20:51.789 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:22:10.772 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:22:11.806 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57038.
2021-09-17 13:23:34.149 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:23:34.803 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:23:51.818 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:25:11.795 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:25:12.842 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57240.
2021-09-17 13:26:34.176 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:26:34.830 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2021-09-17 13:28:12.812 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2021-09-17 13:28:13.848 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57446.