[rabbitmq][kolla-ansible] - RabbitMQ disconnects - 60s timeout

Adam Tomas bkslash at poczta.onet.pl
Wed Mar 24 08:39:53 UTC 2021


Hi Mark,
I did it (heartbeat_timeout_threshold set to 360s, 4 heartbeats in this time instead of default two), restarted all containers and… still got 

/var/log/kolla/rabbitmq/rabbit at os-ppu-controller1.log:2021-03-24 08:01:55.622 [error] <0.24155.9> closing AMQP connection <0.24155.9> (x.x.x.100:40812 -> x.x.x.100:5672 - mod_wsgi:698:950db474-56b6-434c-80fa-6827f0a1ba34):

/var/log/kolla/rabbitmq/rabbit at os-ppu-controller1.log:2021-03-24 08:01:55.625 [info] <0.24977.9> Closing all channels from connection ‚<<"x.x.x.100:40812 -> x.x.x.100:5672">>' because it has been closedsimilar errors, but this time without „60s timeout”.


/var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 06:00:39.453 695 ERROR oslo.messaging._drivers.impl_rabbit [-] [899a88f9-d779-4747-8861-6734a2eb2924] AMQP server on x.x.x.100:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>

/var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 06:00:39.454 695 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed

/var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 07:53:55.611 696 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection

/var/log/kolla/nova/nova-api-wsgi.log:2021-03-24 07:53:55.975 698 INFO oslo.messaging._drivers.impl_rabbit [-] [a8a0d91e-697d-4c7d-83c4-c753559cadf3] Reconnected to AMQP server on x.x.x.111:5672 via [amqp] client with port 33694.


In addition another problem came up after restarting Neutron - DHCP port in Octavia management network went down (so it blocks creating new loadbalancers, because amphora can’t get IP address) and I can’t change it’s status to ACTIVE:

| 778c7248-2224-43aa-a0cf-3c4a8e48e55a |                                                                                         | fa:16:3e:2b:38:d7 | ip_address=y.y.y.21', subnet_id='0aa85b81-b5d8-457a-85ab-51b7b6374e02'  | DOWN

$ neutron port-update 778c7248-2224-43aa-a0cf-3c4a8e48e55a --status ACTIVE
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Cannot update read-only attribute status
Neutron server returns request_ids: ['req-5682582a-6db7-4cdc-b86a-f703cedf83fd’]

I can ping this IP while it shows status DOWN!

Best regards,
Adam

> Wiadomość napisana przez Mark Goddard <mark at stackhpc.com> w dniu 22.03.2021, o godz. 15:45:
> 
> On Mon, 22 Mar 2021 at 13:22, Adam Tomas <bkslash at poczta.onet.pl> wrote:
>> 
>> Hi Mark,
>> this happens without any noticeable load on Openstack. Where should I put [oslo_messaging_rabbitmq] heartbeat_timeout_threshold in kolla-ansible? I can’t find any oslo.config file… Should it be in  .conf file of every service?
>> Best regards
>> Adam
> 
> Yes, any service in which you are seeing heartbeat timeouts.
>> 
>> 
>> Wiadomość napisana przez Mark Goddard <mark at stackhpc.com> w dniu 22.03.2021, o godz. 10:49:
>> 
>> On Thu, 18 Mar 2021 at 16:05, Adam Tomas <bkslash at poczta.onet.pl> wrote:
>> 
>> 
>> Hi,
>> I have a problem with rabbitmq heartbeats timeout (described also here: https://bugzilla.redhat.com/show_bug.cgi?id=1711794). Although I have threads=1 the problem still persists, generating a lot of messages in logs:
>> 
>> 2021-03-18 15:17:17.482 [error] <0.122.51> closing AMQP connection <0.122.51> (x.x.x.100:60456 -> x.x.x.100:5672 - mod_wsgi:699:9a813fcb-c29f-4886-82bc-00bf478b6b64):
>> missed heartbeats from client, timeout: 60s
>> 2021-03-18 15:17:17.484 [info] <0.846.51> Closing all channels from connection '<<x.x.x.100:60456 -> x.x.x.100:5672">>' because it has been closed
>> 2021-03-18 15:18:15.918 [error] <0.150.51> closing AMQP connection <0.150.51> (x.x.x.111:41934 -> x.x.x.100:5672 - mod_wsgi:697:b608c7b8-9644-434e-93af-c00222c0a700):
>> missed heartbeats from client, timeout: 60s
>> 2021-03-18 15:18:15.920 [error] <0.153.51> closing AMQP connection <0.153.51> (x.x.x.111:41936 -> x.x.x.100:5672 - mod_wsgi:697:77348197-148b-41a6-928f-c5eddfab57c9):
>> missed heartbeats from client, timeout: 60s
>> 2021-03-18 15:18:15.920 [info] <0.1527.51> Closing all channels from connection '<<x.x.x.111:41934 -> x.x.x.100:5672">>' because it has been closed
>> 2021-03-18 15:18:15.922 [info] <0.1531.51> Closing all channels from connection '<<x.x.x.111:41936 -> x.x.x.100:5672">>' because it has been closed
>> 2021-03-18 15:20:16.080 [info] <0.2196.51> accepting AMQP connection <0.2196.51> (x.x.x.111:34826 -> x.x.x.100:5672)
>> 2021-03-18 15:20:16.080 [info] <0.2199.51> accepting AMQP connection <0.2199.51> (x.x.x.111:34828 -> x.x.x.100:5672)
>> 
>> I’ve set heartbeat = 600 in rabbitmq.conf and still get disconnections after 60s timeout… How to set proper timeout to avoid disconnections?
>> 
>> 
>> Hi Adam,
>> 
>> I have seen similar messages in the past, but haven't really looked
>> into it. It seems to happen during some intensive processes like
>> encrypted cinder volume creation.
>> 
>> Have you tried configuring oslo.messaging? For example,
>> [oslo_messaging_rabbitmq] heartbeat_timeout_threshold.
>> 
>> https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html
>> 
>> Mark
>> 
>> 
>> Best regards
>> Adam
>> 
>> 




More information about the openstack-discuss mailing list