[rabbitmq][kolla-ansible] - RabbitMQ disconnects - 60s timeout
Hi, I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794 <https://bugzilla.redhat.com/show_bug.cgi?id=1711794>). Although I have threads=1 the problem still persists, generating a lot of messages in logs: 2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64): missed heartbeats from client, timeout: 60s 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672) 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672) I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections? Best regards Adam
On Thu, 18 Mar 2021 at 16:05, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi, I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794). Although I have threads=1 the problem still persists, generating a lot of messages in logs:
2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64): missed heartbeats from client, timeout: 60s 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672) 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672)
I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections?
Hi Adam, I have seen similar messages in the past, but haven't really looked into it. It seems to happen during some intensive processes like encrypted cinder volume creation. Have you tried configuring oslo.messaging? For example, [oslo_messaging_rabbitmq] heartbeat_timeout_threshold. https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html Mark
Best regards Adam
Hi Mark, this happens without any noticeable load on Openstack. Where should I put [oslo_messaging_rabbitmq] heartbeat_timeout_threshold in kolla-ansible? I can’t find any oslo.config file… Should it be in .conf file of every service? Best regards Adam
Wiadomość napisana przez Mark Goddard <mark@stackhpc.com> w dniu 22.03.2021, o godz. 10:49:
On Thu, 18 Mar 2021 at 16:05, Adam Tomas <bkslash@poczta.onet.pl <mailto:bkslash@poczta.onet.pl>> wrote:
Hi, I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794). Although I have threads=1 the problem still persists, generating a lot of messages in logs:
2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64): missed heartbeats from client, timeout: 60s 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672) 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672)
I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections?
Hi Adam,
I have seen similar messages in the past, but haven't really looked into it. It seems to happen during some intensive processes like encrypted cinder volume creation.
Have you tried configuring oslo.messaging? For example, [oslo_messaging_rabbitmq] heartbeat_timeout_threshold.
https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html <https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html>
Mark
Best regards Adam
On Mon, 22 Mar 2021 at 13:22, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi Mark, this happens without any noticeable load on Openstack. Where should I put [oslo_messaging_rabbitmq] heartbeat_timeout_threshold in kolla-ansible? I can’t find any oslo.config file… Should it be in .conf file of every service? Best regards Adam
Yes, any service in which you are seeing heartbeat timeouts.
Wiadomość napisana przez Mark Goddard <mark@stackhpc.com> w dniu 22.03.2021, o godz. 10:49:
On Thu, 18 Mar 2021 at 16:05, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi, I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794). Although I have threads=1 the problem still persists, generating a lot of messages in logs:
2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64): missed heartbeats from client, timeout: 60s 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672) 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672)
I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections?
Hi Adam,
I have seen similar messages in the past, but haven't really looked into it. It seems to happen during some intensive processes like encrypted cinder volume creation.
Have you tried configuring oslo.messaging? For example, [oslo_messaging_rabbitmq] heartbeat_timeout_threshold.
https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html
Mark
Best regards Adam
Hi Mark, I did it (heartbeat_timeout_threshold set to 360s, 4 heartbeats in this time instead of default two), restarted all containers and… still got /var/log/kolla/rabbitmq/rabbit@os-ppu-controller1.log:2021-03-24 08:01:55.622 [error] <0.24155.9> closing AMQP connection <0.24155.9> (x.x.x.100:40812 -> x.x.x.100:5672 - mod_wsgi:698:950db474-56b6-434c-80fa-6827f0a1ba34): /var/log/kolla/rabbitmq/rabbit@os-ppu-controller1.log:2021-03-24 08:01:55.625 [info] <0.24977.9> Closing all channels from connection ‚<<"x.x.x.100:40812 -> x.x.x.100:5672">>' because it has been closedsimilar errors, but this time without „60s timeout”. /var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 06:00:39.453 695 ERROR oslo.messaging._drivers.impl_rabbit [-] [899a88f9-d779-4747-8861-6734a2eb2924] AMQP server on x.x.x.100:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error> /var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 06:00:39.454 695 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed /var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 07:53:55.611 696 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection /var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 07:53:55.975 698 INFO oslo.messaging._drivers.impl_rabbit [-] [a8a0d91e-697d-4c7d-83c4-c753559cadf3] Reconnected to AMQP server on x.x.x.111:5672 via [amqp] client with port 33694. In addition another problem came up after restarting Neutron - DHCP port in Octavia management network went down (so it blocks creating new loadbalancers, because amphora can’t get IP address) and I can’t change it’s status to ACTIVE: | 778c7248-2224-43aa-a0cf-3c4a8e48e55a | | fa:16:3e:2b:38:d7 | ip_address=y.y.y.21', subnet_id='0aa85b81-b5d8-457a-85ab-51b7b6374e02' | DOWN $ neutron port-update 778c7248-2224-43aa-a0cf-3c4a8e48e55a --status ACTIVE neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. Cannot update read-only attribute status Neutron server returns request_ids: ['req-5682582a-6db7-4cdc-b86a-f703cedf83fd’] I can ping this IP while it shows status DOWN! Best regards, Adam
Wiadomość napisana przez Mark Goddard <mark@stackhpc.com> w dniu 22.03.2021, o godz. 15:45:
On Mon, 22 Mar 2021 at 13:22, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi Mark, this happens without any noticeable load on Openstack. Where should I put [oslo_messaging_rabbitmq] heartbeat_timeout_threshold in kolla-ansible? I can’t find any oslo.config file… Should it be in .conf file of every service? Best regards Adam
Yes, any service in which you are seeing heartbeat timeouts.
Wiadomość napisana przez Mark Goddard <mark@stackhpc.com> w dniu 22.03.2021, o godz. 10:49:
On Thu, 18 Mar 2021 at 16:05, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi, I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794). Although I have threads=1 the problem still persists, generating a lot of messages in logs:
2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64): missed heartbeats from client, timeout: 60s 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9): missed heartbeats from client, timeout: 60s 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672) 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672)
I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections?
Hi Adam,
I have seen similar messages in the past, but haven't really looked into it. It seems to happen during some intensive processes like encrypted cinder volume creation.
Have you tried configuring oslo.messaging? For example, [oslo_messaging_rabbitmq] heartbeat_timeout_threshold.
https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html
Mark
Best regards Adam
participants (2)
-
Adam Tomas
-
Mark Goddard