DuplicateMessageError after restart of a control node
Hello everybody, after restart of one of the control nodes i see this error message every few seconds int the logs: @timestamp July 23rd 2019, 16:44:08.927 t Hostname ctlX t Logger openstack.nova t Payload Failed to process message ... skipping it.: DuplicateMessageError: Found duplicate message(f2178cffb5d249f3a1d2df1c0322f18c). Skipping it. 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 361, in _callback 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit self.callback(RabbitMessage(message)) 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 251, in __call__ 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit unique_id = self.msg_id_cache.check_duplicate_message(message) 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqp.py", line 122, in check_duplicate_message 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit raise rpc_common.DuplicateMessageError(msg_id=msg_id) 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit DuplicateMessageError: Found duplicate message(f2178cffb5d249f3a1d2df1c0322f18c). Skipping it. 2019-07-23 16:44:08.927 82 ERROR oslo.messaging._drivers.impl_rabbit t Pid 82 t Timestamp 2019-07-23 16:44:08.927 t _id AWwfSjEtsndUu9YeotwF t _index flog-2019.07.23 # _score - t _type fluentd t log_level ERROR t programname nova-scheduler t python_module oslo.messaging._drivers.impl_rabbit As soon as the restarted node is taken out of the cluster, the errors disappear. Any idea how to fix this? Thank you Pawel
Hello everybody, after some investigation in the RabbitMQ problems we found some duplicated messages and timeouts in logs. Restarting the whole RabbitMQ cluster (stop all rabbitmq containers and start one by one) solved the problems for now. The main cause for this issue seams to by the nova notifications configuration with was deployed by kolla-ansible. If searchlight is not installed the 'notifications/notification_format' should be 'unversioned'. Default is 'both' so nova will send a notification to the queue versioned_notifications with has no consumer. In our case the queue got huge amount of messages with made the rabbitmq cluster more and more unstable, see: https://bugzilla.redhat.com/show_bug.cgi?id=1592528 Following settings in nova.conf may solve this issue but we didn`t tested this yet: [notification] notification_format = unversioned BR Pawel
Hi, I am not sure about the actual default value for notification_format (I can recall that there was some debate recently in nova community), but the solution should be to select unversioned, as most consumers of nova notifications use the legacy unversioned notifications, so if the config is both, the new versioned notifications can cause trouble on message bus as nobody fetch them. Lajos Pawel Konczalski <pawel.konczalski@everyware.ch> ezt írta (időpont: 2019. júl. 24., Sze, 18:42):
Hello everybody,
after some investigation in the RabbitMQ problems we found some duplicated messages and timeouts in logs. Restarting the whole RabbitMQ cluster (stop all rabbitmq containers and start one by one) solved the problems for now.
The main cause for this issue seams to by the nova notifications configuration with was deployed by kolla-ansible. If searchlight is not installed the 'notifications/notification_format' should be 'unversioned'. Default is 'both' so nova will send a notification to the queue versioned_notifications with has no consumer. In our case the queue got huge amount of messages with made the rabbitmq cluster more and more unstable, see: https://bugzilla.redhat.com/show_bug.cgi?id=1592528
Following settings in nova.conf may solve this issue but we didn`t tested this yet: [notification] notification_format = unversioned
BR
Pawel
participants (2)
-
Lajos Katona
-
Pawel Konczalski