Hi Ben, Thank you for the answer. I’m using open stack Victoria deployed with Kolla-ansible. So I assume I should add (in /etc/kolla/config/global.conf) [oslo_messaging_rabbit] heartbeat_in_pthread = True But for Victoria (nova sample configuration file) it seems to be deprecated? # DEPRECATED: Run the health check heartbeat thread through a native python # thread by default. If this option is equal to False then the health check # heartbeat will inherit the execution model from the parent process. For # example if the parent process has monkey patched the stdlib by using # eventlet/greenlet then the heartbeat will be run through a green thread # (boolean value) # This option is deprecated for removal. # Its value may be silently ignored in the future. #heartbeat_in_pthread = true So will it work? Best regards Adam Tomas
Wiadomość napisana przez Ben Nemec <openstack@nemebean.com> w dniu 17.09.2021, o godz. 18:57:
This sounds like you're hitting the heartbeat/eventlet bug. Basically eventlet won't schedule the heartbeat thread unless something is regularly waking up the api threads. Depending on what release you're running, you may be able to set heartbeat_in_pthread[0] to fix this.
0: https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messagin...
On 9/17/21 7:00 AM, Adam Tomas wrote:
Hi, after some struggling I have almost „clear” logs (clear=error free :) ) Almost…. RabbitMQ keeps disconnecting sessions and there is a huge amount of disconnect errors in all logs (see below). I found this bug description: https://bugzilla.redhat.com/show_bug.cgi?id=1711794 <https://bugzilla.redhat.com/show_bug.cgi?id=1711794> in which we can read as follows: "this is a recoverable issue that is already handled by how oslo.messaging is designed. disconnection is not an error and should not be reported as such in the logs.” but… It is reported :( And produces tons of logs. I tried to modify heartbeat values - helped a little bit, but I had to increase [database] max_pool_size = 5 and that of course multiplied number of disconnection errors by 5 :( [oslo_messaging_rabbit] heartbeat_timeout_threshold = 720 heartbeat_interval = 360 heartbeat_rate = 4 How to avoid this disconnections?? Or what to do to avoid writing this „errors”to logs? Best regards Adam Tomas /var/log/kolla/rabbitmq/rabbitXXX.log 2021-09-17 06:43:08.771 [error] <0.16056.678> closing AMQP connection <0.16056.678> (x.x.x.x00:44882 -> x.x.x.x00:5672 - mod_wsgi:959:aaa12af5-bb0b-456e-a129-a466eccc24f0): missed heartbeats from client, timeout: 60s 2021-09-17 06:43:08.773 [info] <0.7412.679> Closing all channels from connection 'x.x.x.x00:44882 -> x.x.x.x00:5672' because it has been closed 2021-09-17 06:43:09.282 [info] <0.4574.679> accepting AMQP connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672) 2021-09-17 06:43:09.283 [info] <0.28410.678> accepting AMQP connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672) 2021-09-17 06:43:09.285 [info] <0.28410.678> Connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:954:c05a1221-6915-4414-ba93-7db5450ff394 2021-09-17 06:43:09.285 [info] <0.4574.679> Connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:967:08f29ce5-d6ee-43ae-87a9-e8c8856f8c93 2021-09-17 06:43:09.286 [info] <0.4574.679> connection <0.4574.679> (x.x.x.x11:58170 -> x.x.x.x00:5672 - magnum-conductor:967:08f29ce5-d6ee-43ae-87a9-e8c8856f8c93): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.286 [info] <0.28410.678> connection <0.28410.678> (x.x.x.x11:58172 -> x.x.x.x00:5672 - magnum-conductor:954:c05a1221-6915-4414-ba93-7db5450ff394): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.504 [error] <0.14465.678> closing AMQP connection <0.14465.678> (x.x.x.x00:44914 -> x.x.x.x00:5672 - mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3): missed heartbeats from client, timeout: 60s 2021-09-17 06:43:09.507 [info] <0.14985.679> Closing all channels from connection 'x.x.x.x00:44914 -> x.x.x.x00:5672' because it has been closed 2021-09-17 06:43:09.521 [info] <0.11574.678> accepting AMQP connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672) 2021-09-17 06:43:09.523 [info] <0.11574.678> Connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672) has a client-provided name: mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3 2021-09-17 06:43:09.525 [info] <0.11574.678> connection <0.11574.678> (x.x.x.x00:46834 -> x.x.x.x00:5672 - mod_wsgi:964:ced74779-2794-40f6-a01e-276bceb5d7b3): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.654 [info] <0.31575.679> accepting AMQP connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672) 2021-09-17 06:43:09.656 [info] <0.31575.679> Connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:7:5a1cc3cc-150d-417a-9652-1279597c3db7 2021-09-17 06:43:09.656 [info] <0.31180.678> accepting AMQP connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672) 2021-09-17 06:43:09.658 [info] <0.31575.679> connection <0.31575.679> (x.x.x.x01:55304 -> x.x.x.x00:5672 - magnum-conductor:7:5a1cc3cc-150d-417a-9652-1279597c3db7): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.658 [info] <0.31180.678> Connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:954:c8deffa4-7d67-46be-9b9d-09fa20544a2b 2021-09-17 06:43:09.660 [info] <0.2757.679> accepting AMQP connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672) 2021-09-17 06:43:09.660 [info] <0.31180.678> connection <0.31180.678> (x.x.x.x01:55306 -> x.x.x.x00:5672 - magnum-conductor:954:c8deffa4-7d67-46be-9b9d-09fa20544a2b): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.662 [info] <0.2757.679> Connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672) has a client-provided name: magnum-conductor:7:cebf81c9-95f6-48fa-8bde-30c01243d315 2021-09-17 06:43:09.663 [info] <0.2757.679> connection <0.2757.679> (x.x.x.x11:58174 -> x.x.x.x00:5672 - magnum-conductor:7:cebf81c9-95f6-48fa-8bde-30c01243d315): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:09.814 [error] <0.22797.678> closing AMQP connection <0.22797.678> (x.x.x.x05:57676 -> x.x.x.x00:5672 - neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e): missed heartbeats from client, timeout: 60s 2021-09-17 06:43:09.817 [info] <0.7903.679> Closing all channels from connection 'x.x.x.x05:57676 -> x.x.x.x00:5672' because it has been closed 2021-09-17 06:43:10.319 [error] <0.18268.678> closing AMQP connection <0.18268.678> (x.x.x.x00:44924 -> x.x.x.x00:5672 - nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09): missed heartbeats from client, timeout: 60s 2021-09-17 06:43:10.322 [info] <0.28032.678> Closing all channels from connection 'x.x.x.x00:44924 -> x.x.x.x00:5672' because it has been closed 2021-09-17 06:43:10.330 [info] <0.19596.678> accepting AMQP connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672) 2021-09-17 06:43:10.333 [info] <0.19596.678> Connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672) has a client-provided name: nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09 2021-09-17 06:43:10.334 [info] <0.19596.678> connection <0.19596.678> (x.x.x.x00:46850 -> x.x.x.x00:5672 - nova-conductor:957:e3951cbd-37ca-4d9f-8fac-394c83b36f09): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:10.831 [info] <0.9838.678> accepting AMQP connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672) 2021-09-17 06:43:10.834 [info] <0.9838.678> Connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672) has a client-provided name: neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e 2021-09-17 06:43:10.835 [info] <0.9838.678> connection <0.9838.678> (x.x.x.x05:57878 -> x.x.x.x00:5672 - neutron-openvswitch-agent:7:78d30277-173a-47d4-b8a8-3edde75f115e): user 'openstack' authenticated and granted access to vhost '/' 2021-09-17 06:43:11.738 [error] <0.15275.678> closing AMQP connection <0.15275.678> (x.x.x.x01:53414 -> x.x.x.x00:5672 - mod_wsgi:968:b60f723c-882e-4780-a162-55c03ebe6179): missed heartbeats from client, timeout: 60s /var/log/kolla/glance/glance-api.log 2021-09-17 13:17:23.212 958 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:17:29.208 955 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:17:42.190 957 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:17:42] "GET / HTTP/1.1" 300 1446 0.001949 2021-09-17 13:17:46.351 957 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:18:12.336 955 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:18:12] "GET / HTTP/1.1" 300 1446 0.003754 2021-09-17 13:18:13.703 956 INFO eventlet.wsgi.server [req-1a00db34-349b-4ed3-8625-edfadc939be4 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:13] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.139229 2021-09-17 13:18:33.065 955 INFO eventlet.wsgi.server [req-a57046eb-549e-4837-95c7-d0994c05449e d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:33] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.096302 2021-09-17 13:18:33.077 955 INFO eventlet.wsgi.server [req-3a8af281-6732-4afd-b3ec-e502367bce51 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:18:33] "GET /v2/schemas/image HTTP/1.1" 200 6259 0.003982 2021-09-17 13:18:36.747 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:18:36.759 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:18:42.460 957 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:18:42] "GET / HTTP/1.1" 300 1446 0.001383 2021-09-17 13:18:52.659 956 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:18:52.670 956 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:19:12.598 955 INFO eventlet.wsgi.server [-] x.x.x.x00 - - [17/Sep/2021 13:19:12] "GET / HTTP/1.1" 300 1446 0.001590 2021-09-17 13:19:16.367 957 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:19:27.127 955 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:19:27.184 955 INFO eventlet.wsgi.server [req-8f68efc9-f46a-450d-8138-928e4f2585b4 d33b56f82a984f6a99bad58717f6f0ef 6ee7e3517d254f9e9ee5eac7d56a3179 - default default] x.x.x.x09,x.x.x.x04 - - [17/Sep/2021 13:19:27] "GET /v2/images?limit=20 HTTP/1.1" 200 2990 0.032917 /var/log/kolla/keystone/keystone.log 2021-09-17 13:13:54.670 958 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:13:55.148 960 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:13:55.150 962 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:13:55.345 959 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:13:56.044 961 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:18.402 966 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:31.372 963 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:36.580 967 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:38.771 965 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:16:12.226 964 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer /var/log/kolla/nova/nova-compute.log 2021-09-17 13:10:07.710 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56226. 2021-09-17 13:11:11.473 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:11:34.046 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:11:34.721 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:13:07.699 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:13:08.734 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56434. 2021-09-17 13:14:11.501 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:34.070 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:14:34.745 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:16:08.724 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:16:09.756 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56638. 2021-09-17 13:17:34.098 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:19:09.747 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:19:10.781 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 56834. 2021-09-17 13:20:34.124 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:20:34.777 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:20:51.789 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:22:10.772 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:22:11.806 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57038. 2021-09-17 13:23:34.149 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:23:34.803 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:23:51.818 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:25:11.795 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:25:12.842 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57240. 2021-09-17 13:26:34.176 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:26:34.830 7 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 2021-09-17 13:28:12.812 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] AMQP server on x.x.x.x00:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2021-09-17 13:28:13.848 7 INFO oslo.messaging._drivers.impl_rabbit [-] [9d51e0ec-bd43-46b4-8c57-a9c6aa01903d] Reconnected to AMQP server on x.x.x.x00:5672 via [amqp] client with port 57446.