Description
Describe the bug
Given my configuration below - the problem is that if out_forward does not reach the target destination, it produces lots of buffer files, which one would assume that go eventually into secondary folder after reaching retry_timeout time.
In my case I have set retry_timeout to 50 seconds, 4 tags to buffer and buffer by timekey 10 seconds, there is 2 files created for every buffer (normal + .meta). I would expect that every ten seconds there is 8 files created (one normal and one meta buffer file for each tag), which I can confirm (see Additional context), and I would expect that every one of them would be moved to secondary after 50 seconds - which does not always happen. Some are much later, which causes rising number of buffer files and not having enough moved soon enough to secondary.
Rising number seems to never stop, I have stopped the fluentd after it has created approximately 2000 buffer files, although the maximum should be stabilized at 40 (50s*8files/10s). It is creating some secondary files, but not as fast as new buffer files are created.
How should I optimize the system so that if target instance of forward plugin does not accept I would not run out of file descriptors?
To Reproduce
Use configuration below, but create files for certificates - bad ones, failing, so that the connection could not be established. Replace {{ home }} for you home path and make sure all paths really exist.
Expected behavior
After reaching retry_timeout time, number of buffer files reaches its maximum. Having secondary files created as fast as new buffer files are created.
Your Environment
- Fluentd version: 1.14.3
- TD Agent version: -
- Operating system: Centos 7
- Kernel version: 3.10.0-1160.49.1.el7.x86_64
Your Configuration
<source>
@type sample
auto_increment_key foo_key
sample {"hello":"infin","foo_key":0}
tag nginx.status.infin
</source>
<source>
@type sample
auto_increment_key foo_key
sample {"hello":"platba","foo_key":0}
tag nginx.status.platba
</source>
<source>
@type sample
auto_increment_key foo_key
sample {"hello":"api","foo_key":0}
tag nginx.status.api
</source>
<source>
@type sample
auto_increment_key foo_key
sample {"hello":"aukce","foo_key":0}
tag nginx.status.aukce
</source>
<match nginx.status.*>
@type forward
transport tls
tls_client_cert_path {{ home }}/certs/fluentd/origin_cert.crt
tls_client_private_key_path {{ home }}/certs/fluentd/origin_key.key
tls_cert_path {{ home }}/certs/fluentd/target_cert.crt
tls_verify_hostname false
<server>
name monitoring
host 127.0.0.1
port 24224
weight 60
</server>
<buffer tag,time>
@type file
path {{ home }}/log/fluentd/nginx.status.buff
timekey 10s
timekey_wait 0s
flush_mode lazy
total_limit_size 800MB
retry_timeout 50
retry_wait 13s
retry_forever false
retry_randomize false
retry_type periodic
flush_thread_count 4
</buffer>
<secondary>
@type secondary_file
directory {{ home }}/log/fluentd/secondary/
basename dump.${chunk_id}
</secondary>
</match>
Your Error Log
2022-01-26 20:11:14 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:11:27 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward.rb:365:in `write'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1179:in `try_flush'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1491:in `flush_thread_run'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2022-01-26 20:11:14 +0100 [warn]: #0 /home/vagrant/.gem/ruby/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-01-26 20:11:27 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:11:40 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:11:27 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:40 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:11:41 +0100 chunk="5d680f189937638692f2f4afcc0a6071" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:11:40 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:41 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f1899a1a00b215af05d1dc6537d"
2022-01-26 20:11:42 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:11:55 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:11:42 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:11:55 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:12:08 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:11:55 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:08 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:12:21 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:08 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:21 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:12:22 +0100 chunk="5d680f22322ac907cff2f4ca85066230" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:21 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:22 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f22329accc8480d02c9d0961c30"
2022-01-26 20:12:22 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:12:35 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:22 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:22 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f22326be06201e6259d360eee00"
2022-01-26 20:12:22 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:12:35 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:22 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:35 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:12:48 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:35 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:12:48 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:13:01 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:12:48 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:01 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:13:02 +0100 chunk="5d680f2232d0fe1fffd93a17cb23f682" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:01 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:02 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2bb03e296a898e5ba8726e27dd"
2022-01-26 20:13:02 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:13:15 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:02 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:02 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2232d0fe1fffd93a17cb23f682"
2022-01-26 20:13:03 +0100 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-01-26 20:13:16 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:03 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:16 +0100 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-01-26 20:13:29 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:16 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:29 +0100 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2022-01-26 20:13:42 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:29 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:42 +0100 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2022-01-26 20:13:43 +0100 chunk="5d680f2bb089ae25b4c06615cb9fc85b" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
2022-01-26 20:13:42 +0100 [warn]: #0 suppressed same stacktrace
2022-01-26 20:13:43 +0100 [warn]: #0 retry succeeded by secondary. chunk_id="5d680f2bb089ae25b4c06615cb9fc85b"
Additional context
I am occasionaly issuing this command to see how many files are in buffers:
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1227
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1223
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1231
[vagrant@devcentos7 fluentd]$ find * -type f | wc -l
1239
You can see that I get 8 files created as I wrote in description approx. every ten seconds. But when secondary is created number of files is lowered by 4. It seems like if creating secondary files was too much behind.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status