RE: Problems with Fluentd buffer [fluentd][kolla-ansible]
Hi, I’ve turned off „central_logging” and now I have only several errors like below on all nodes (3-7 per day) 0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 04:49:32.728326628 +0200 chunk="5ccf11fdc0d6876abdef813211371285" error_class=RestClient::RequestTimeout error="408 Request Timeout” On compute node, which had errors like this : /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag=„openstack_python" I’ve restarted fluentd container on that node and now it’s not showing errors like this anymore. So what’s the problem? Performance of Monasca/Kibana/Elasticsearch? But the node (telemetry) on which this stack is running is not overloaded - 20-30% CPU (24 cores on that node) , 40% RAM (there is 64GB), a lot of space on SSD. Best regards Adam Tomas
Hi, after some time of using newly deployed kolla-ansible I have fluentd errors:
on all controllers, storage and network nodes:
/var/log/kolla/fluentd/fluentd.log:2021-09-27 01:40:46 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 01:40:47.847564807 +0200 chunk="5ccee7b1762157f0ae10dcbeb4e14fd8" error_class=RestClient::GatewayTimeout error="504 Gateway Timeout" /var/log/kolla/fluentd/fluentd.log:2021-09-27 04:49:31 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 04:49:32.728326628 +0200 chunk="5ccf11fdc0d6876abdef813211371285" error_class=RestClient::RequestTimeout error="408 Request Timeout”
on compute nodes:
/var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="openstack_python" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="kolla.var.log.kolla.monasca.agent-statsd.log" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"action"=>:throw_exception, "message"=>"failed to write data into buffer by buffer overflow action=:throw_exception"} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"error"=>"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>", "location"=>"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'", "tag"=>"openstack_python", "message"=>"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'\" tag=\"openstack_python\""} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"error"=>"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>", "location"=>"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'", "tag"=>"kolla.var.log.kolla.monasca.agent-statsd.log", "message"=>"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'\" tag=\"kolla.var.log.kolla.monasca.agent-statsd.log\""} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="openstack_python" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="kolla.var.log.kolla.monasca.agent-forwarder.log" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"action"=>:throw_exception, "message"=>"failed to write data into buffer by buffer overflow action=:throw_exception"} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data”
and on monitoring node:
2021-09-27 14:40:50 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=84.05643947119825 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:42:53 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=122.65328024700284 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:44:15 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=82.32426812895574 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:46:05 +0200 [warn]: #0 Bulk sending messages to monasca-api threw exception exceptionew=#<RestClient::Exceptions::ReadTimeout: Timed out reading data from server> 2021-09-27 14:46:05 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 14:46:06.042150289 +0200 chunk="5ccf89c9ec54b09e616f7167d5f93cc1" error_class=RestClient::Exceptions::ReadTimeout error="Timed out reading data from server" 2021-09-27 14:46:05 +0200 [warn]: #0 suppressed same stacktrace 2021-09-27 14:47:58 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=112.90601075813174 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:47:58 +0200 [warn]: #0 retry succeeded. chunk_id="5ccf89c9ec54b09e616f7167d5f93cc1" 2021-09-27 14:50:16 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=136.24769522389397 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:52:27 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=129.86474119895138 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:54:08 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=100.71324555086903 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:56:31 +0200 [warn]: #0 Bulk sending messages to monasca-api threw exception exceptionew=#<RestClient::Exceptions::ReadTimeout: Timed out reading data from server>
On Tue, 28 Sept 2021 at 07:54, Adam Tomas <bkslash@poczta.onet.pl> wrote:
Hi,
I’ve turned off „central_logging” and now I have only several errors like below on all nodes (3-7 per day)
0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 04:49:32.728326628 +0200 chunk="5ccf11fdc0d6876abdef813211371285" error_class=RestClient::RequestTimeout error="408 Request Timeout”
On compute node, which had errors like this :
/var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag=„openstack_python"
I’ve restarted fluentd container on that node and now it’s not showing errors like this anymore.
So what’s the problem? Performance of Monasca/Kibana/Elasticsearch? But the node (telemetry) on which this stack is running is not overloaded - 20-30% CPU (24 cores on that node) , 40% RAM (there is 64GB), a lot of space on SSD.
Hi Adam, Have you checked whether the logs are being pushed to Elasticsearch via the monasca log API? That might cause fluentd to start overflowing its buffers. We do have an issue where our monasca logging CI job is failing currently, but we have not yet had time to find the root cause. Thanks, Mark
Best regards Adam Tomas
Hi, after some time of using newly deployed kolla-ansible I have fluentd errors:
on all controllers, storage and network nodes:
/var/log/kolla/fluentd/fluentd.log:2021-09-27 01:40:46 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 01:40:47.847564807 +0200 chunk="5ccee7b1762157f0ae10dcbeb4e14fd8" error_class=RestClient::GatewayTimeout error="504 Gateway Timeout" /var/log/kolla/fluentd/fluentd.log:2021-09-27 04:49:31 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 04:49:32.728326628 +0200 chunk="5ccf11fdc0d6876abdef813211371285" error_class=RestClient::RequestTimeout error="408 Request Timeout”
on compute nodes:
/var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="openstack_python" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="kolla.var.log.kolla.monasca.agent-statsd.log" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"action"=>:throw_exception, "message"=>"failed to write data into buffer by buffer overflow action=:throw_exception"} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"error"=>"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>", "location"=>"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'", "tag"=>"openstack_python", "message"=>"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'\" tag=\"openstack_python\""} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"error"=>"#<Fluent::Plugin::Buffer::BufferOverflowError: buffer space has too many data>", "location"=>"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'", "tag"=>"kolla.var.log.kolla.monasca.agent-statsd.log", "message"=>"emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=\"buffer space has too many data\" location=\"/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'\" tag=\"kolla.var.log.kolla.monasca.agent-statsd.log\""} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="openstack_python" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="fluent.warn" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.2/lib/fluent/plugin/buffer.rb:293:in `write'" tag="kolla.var.log.kolla.monasca.agent-forwarder.log" /var/log/kolla/fluentd/fluentd.log:2021-09-27 06:41:38 +0200 [error]: #0 failed to emit fluentd's log event tag="fluent.warn" event={"action"=>:throw_exception, "message"=>"failed to write data into buffer by buffer overflow action=:throw_exception"} error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data”
and on monitoring node:
2021-09-27 14:40:50 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=84.05643947119825 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:42:53 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=122.65328024700284 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:44:15 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=82.32426812895574 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:46:05 +0200 [warn]: #0 Bulk sending messages to monasca-api threw exception exceptionew=#<RestClient::Exceptions::ReadTimeout: Timed out reading data from server> 2021-09-27 14:46:05 +0200 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-09-27 14:46:06.042150289 +0200 chunk="5ccf89c9ec54b09e616f7167d5f93cc1" error_class=RestClient::Exceptions::ReadTimeout error="Timed out reading data from server" 2021-09-27 14:46:05 +0200 [warn]: #0 suppressed same stacktrace 2021-09-27 14:47:58 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=112.90601075813174 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:47:58 +0200 [warn]: #0 retry succeeded. chunk_id="5ccf89c9ec54b09e616f7167d5f93cc1" 2021-09-27 14:50:16 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=136.24769522389397 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:52:27 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=129.86474119895138 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:54:08 +0200 [warn]: #0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=100.71324555086903 slow_flush_log_threshold=20.0 plugin_id="object:bff4" 2021-09-27 14:56:31 +0200 [warn]: #0 Bulk sending messages to monasca-api threw exception exceptionew=#<RestClient::Exceptions::ReadTimeout: Timed out reading data from server>
participants (2)
-
Adam Tomas
-
Mark Goddard