Description
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
The computation of stage_size in buffer plugin is not thread safe. This is the line which can cause synchronization issues because the after the value staged_bytesize
is computed, a chunk could be enqueued by the enqueue thread causing incorrect computation of stage_size
To Reproduce
Expected behavior
Your Environment
- Fluentd or td-agent version: 1.4.1, 1.6.3
- Operating system: CentOS 7
- Kernel version: 3.10.0-1062.4.3.el7.x86_64
If you hit the problem with older fluentd version, try latest version first.
Your Configuration
<system>
# process_name fluentd-data
# log_level info
log_level debug
# log_level trace
suppress_repeated_stacktrace
</system>
#
# datastore pipeline
#
<source>
@type monitor_agent
bind 0.0.0.0
port 25220
</source>
<source>
@type forward
port 25224
</source>
<match fluent.**>
# @type stdout
@type null
</match>
#
# references.*
# flows.*
#
<match *.*>
@type file
path /xx/xxx/../${tag[1]}/${tag[0]}/%Y/%m/%d/%H
append false
compress gzip
<buffer tag,time>
@type file
path <path-to-buffer>
flush_mode interval
flush_interval 10m
flush_at_shutdown false
timekey 86400
timekey_use_utc
chunk_limit_size 10mb
chunk_limit_records 100000
flush_thread_count 5
</buffer>
<format>
@type json
</format>
</match>
Your Error Log
As the computation of stage_size is not thread safe, it results in BufferOverflowError
after a while even though the actual buffer directory has not reached the limit.
3602 2019-11-09 14:10:29 +0000 [warn]: failed to write data into buffer by buffer overflow action=:throw_exception
1 2019-11-09 14:10:29 +0000 [warn]: emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/bu
2 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/buffer.rb:269:in `write'
3 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:954:in `block in handle_stream_with_custom_format'
4 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:880:in `write_guard'
5 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:953:in `handle_stream_with_custom_format'
6 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:872:in `execute_chunking'
7 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:793:in `emit_buffered'
8 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/event_router.rb:96:in `emit_stream'
9 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:300:in `on_message'
10 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:211:in `block in handle_connection'
11 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:248:in `block (3 levels) in read_messages'
12 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `feed_each'
13 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `block (2 levels) in read_messages'
14 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:256:in `block in read_messages'
15 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/server.rb:612:in `on_read_without_connection'
16 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:123:in `on_readable'
17 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:186:in `on_readable'
18 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run_once'
19 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run'
20 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
21 2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
Additional context
Here is the snippet of the monitoring agent output and the disk usage of buffer directory
curl http://127.0.0.1:25220/api/plugins.json
{"plugins":[{"plugin_id":"object:c626f4","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"25220"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:ca2c40","plugin_category":"input","type":"forward","config":{"@type":"forward","port":"25224"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:47a018","plugin_category":"output","type":"null","config":{"@type":"null"},"output_plugin":true,"retry_count":0,"retry":{}},{"plugin_id":"object:819890","plugin_category":"output","type":"file","config":{"@type":"file","path":"/var/lib/datastore/fluentd/${tag[1]}/${tag[0]}/%Y/%m/%d/%H","append":"false","compress":"gzip"},"output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":418693164,"retry_count":0,"retry":{}}]}
Disk Utilization of buffer directory:
du -sh /var/lib/datastore/fluentd/buffer
93M /var/lib/datastore/fluentd/buffer
As you can see from the above, the monitoring agent says the buffer_total_queued_size
is ~400M but the actual utilization on disk is only 93M