Skip to content

Buffer files are not cleared after retry_timeout time passes. #3610

Closed
@LukasJerabek

Description

@LukasJerabek

Describe the bug

Given my configuration below - the problem is that if out_forward does not reach the target destination, it produces lots of buffer files, which one would assume that go eventually into secondary folder after reaching retry_timeout time.

In my case I have set retry_timeout to 50 seconds, 4 tags to buffer and buffer by timekey 10 seconds, there is 2 files created for every buffer (normal + .meta). I would expect that every ten seconds there is 8 files created (one normal and one meta buffer file for each tag), which I can confirm (see Additional context), and I would expect that every one of them would be moved to secondary after 50 seconds - which does not always happen. Some are much later, which causes rising number of buffer files and not having enough moved soon enough to secondary.

Rising number seems to never stop, I have stopped the fluentd after it has created approximately 2000 buffer files, although the maximum should be stabilized at 40 (50s*8files/10s). It is creating some secondary files, but not as fast as new buffer files are created.

How should I optimize the system so that if target instance of forward plugin does not accept I would not run out of file descriptors?

To Reproduce

Use configuration below, but create files for certificates - bad ones, failing, so that the connection could not be established. Replace {{ home }} for you home path and make sure all paths really exist.

Expected behavior

After reaching retry_timeout time, number of buffer files reaches its maximum. Having secondary files created as fast as new buffer files are created.

Your Environment

- Fluentd version: 1.14.3
- TD Agent version: - 
- Operating system: Centos 7
- Kernel version: 3.10.0-1160.49.1.el7.x86_64

Your Configuration

<source>
  @type sample
  auto_increment_key foo_key
  sample {"hello":"infin","foo_key":0}
  tag nginx.status.infin
</source>

<source>
  @type sample
  auto_increment_key foo_key
  sample {"hello":"platba","foo_key":0}
  tag nginx.status.platba
</source>

<source>
  @type sample
  auto_increment_key foo_key
  sample {"hello":"api","foo_key":0}
  tag nginx.status.api
</source>

<source>
  @type sample
  auto_increment_key foo_key
  sample {"hello":"aukce","foo_key":0}
  tag nginx.status.aukce
</source>

<match nginx.status.*>
  @type forward
  transport tls
  tls_client_cert_path {{ home }}/certs/fluentd/origin_cert.crt
  tls_client_private_key_path {{ home }}/certs/fluentd/origin_key.key
  tls_cert_path {{ home }}/certs/fluentd/target_cert.crt
  tls_verify_hostname false
  <server>
    name monitoring
    host 127.0.0.1
    port 24224
    weight 60
  </server>
  <buffer tag,time>
    @type file
    path {{ home }}/log/fluentd/nginx.status.buff
    timekey 10s
    timekey_wait 0s
    flush_mode lazy
    total_limit_size 800MB
    retry_timeout 50
    retry_wait 13s
    retry_forever false
    retry_randomize false
    retry_type periodic
    flush_thread_count 4
  </buffer>
  <secondary>
    @type secondary_file
    directory {{ home }}/log/fluentd/secondary/
    basename dump.${chunk_id}
  </secondary>
</match>

Your Error Log

2022-01-26 20:11:14 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:11:27 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward.rb:365:in `write'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1491:in `flush_thread_run'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-01-26 20:11:27 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:11:40 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:11:27 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:40 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:11:41 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:11:40 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:41 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f1899a1a00b215af05d1dc6537d"
2022-01-26 20:11:42 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:11:55 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:11:42 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:55 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:12:08 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:11:55 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:08 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:12:21 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:08 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:21 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:12:22 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:21 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:22 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f22329accc8480d02c9d0961c30"
2022-01-26 20:12:22 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:12:35 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:22 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:22 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f22326be06201e6259d360eee00"
2022-01-26 20:12:22 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:12:35 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:22 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:35 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:12:48 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:35 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:48 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:13:01 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:12:48 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:01 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:13:02 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:01 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:02 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2bb03e296a898e5ba8726e27dd"
2022-01-26 20:13:02 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:13:15 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:02 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:02 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2232d0fe1fffd93a17cb23f682"
2022-01-26 20:13:03 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:13:16 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:03 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:16 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:13:29 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:16 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:29 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:13:42 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:29 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:42 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:13:43 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2022-01-26 20:13:42 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:43 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2bb089ae25b4c06615cb9fc85b"

Additional context

I am occasionaly issuing this command to see how many files are in buffers:
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1227
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1223
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1239

You can see that I get 8 files created as I wrote in description approx. every ten seconds. But when secondary is created number of files is lowered by 4. It seems like if creating secondary files was too much behind.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions