Skip to content

stage size computation in buffer plugin is not thread safe #2712

Closed
@hlakshmi

Description

@hlakshmi

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
The computation of stage_size in buffer plugin is not thread safe. This is the line which can cause synchronization issues because the after the value staged_bytesize is computed, a chunk could be enqueued by the enqueue thread causing incorrect computation of stage_size

To Reproduce

Expected behavior

Your Environment

  • Fluentd or td-agent version: 1.4.1, 1.6.3
  • Operating system: CentOS 7
  • Kernel version: 3.10.0-1062.4.3.el7.x86_64

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<system>
# process_name fluentd-data
# log_level info
  log_level debug
# log_level trace
  suppress_repeated_stacktrace
</system>

#
# datastore pipeline
#
<source>
  @type monitor_agent
  bind 0.0.0.0
  port 25220
</source>

<source>
  @type forward
  port 25224
</source>

<match fluent.**>
# @type stdout
  @type null
</match>

#
# references.*
# flows.*
#
<match *.*>
  @type file
  path /xx/xxx/../${tag[1]}/${tag[0]}/%Y/%m/%d/%H
  append false
  compress gzip
  <buffer tag,time>
    @type file
    path <path-to-buffer>
    flush_mode interval
    flush_interval 10m
    flush_at_shutdown false
    timekey 86400
    timekey_use_utc
    chunk_limit_size 10mb
    chunk_limit_records 100000
    flush_thread_count 5
  </buffer>
  <format>
    @type json
  </format>
</match>

Your Error Log
As the computation of stage_size is not thread safe, it results in BufferOverflowError after a while even though the actual buffer directory has not reached the limit.

3602  2019-11-09 14:10:29 +0000 [warn]: failed to write data into buffer by buffer overflow action=:throw_exception
    1 2019-11-09 14:10:29 +0000 [warn]: emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/bu
    2   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/buffer.rb:269:in `write'
    3   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:954:in `block in handle_stream_with_custom_format'
    4   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:880:in `write_guard'
    5   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:953:in `handle_stream_with_custom_format'
    6   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:872:in `execute_chunking'
    7   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:793:in `emit_buffered'
    8   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/event_router.rb:96:in `emit_stream'
    9   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:300:in `on_message'
   10   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:211:in `block in handle_connection'
   11   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:248:in `block (3 levels) in read_messages'
   12   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `feed_each'
   13   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `block (2 levels) in read_messages'
   14   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:256:in `block in read_messages'
   15   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/server.rb:612:in `on_read_without_connection'
   16   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:123:in `on_readable'
   17   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:186:in `on_readable'
   18   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run_once'
   19   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run'
   20   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
   21   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Additional context
Here is the snippet of the monitoring agent output and the disk usage of buffer directory

curl http://127.0.0.1:25220/api/plugins.json
{"plugins":[{"plugin_id":"object:c626f4","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"25220"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:ca2c40","plugin_category":"input","type":"forward","config":{"@type":"forward","port":"25224"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:47a018","plugin_category":"output","type":"null","config":{"@type":"null"},"output_plugin":true,"retry_count":0,"retry":{}},{"plugin_id":"object:819890","plugin_category":"output","type":"file","config":{"@type":"file","path":"/var/lib/datastore/fluentd/${tag[1]}/${tag[0]}/%Y/%m/%d/%H","append":"false","compress":"gzip"},"output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":418693164,"retry_count":0,"retry":{}}]}

Disk Utilization of buffer directory:

du -sh /var/lib/datastore/fluentd/buffer
93M       /var/lib/datastore/fluentd/buffer

As you can see from the above, the monitoring agent says the buffer_total_queued_size is ~400M but the actual utilization on disk is only 93M

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions