-
Notifications
You must be signed in to change notification settings - Fork 1.4k
formatter_csv: fix memory leak #4864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5111400
to
20f5a97
Compare
20f5a97
to
e6dc411
Compare
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
e6dc411
to
9dd9c1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good catch!
It seems that in #2529, input plugins were assumed to spawn only a fixed number of threads for emitting.
However, in_exec
for example, makes threads for emitting without limit.
The cache would then become infinitely large.
We need to fix this cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread key would need to be unique for each plugin instance.
The following out_http
code has a similar feature.
I want to align the code with this one unless there is a special reason not to.
fluentd/lib/fluent/plugin/out_http.rb
Lines 109 to 111 in 2e896ec
def connection_cache_id_thread_key | |
"#{plugin_id}_connection_cache_id" | |
end |
For example, I want the following getter.
def csv_thread_key
"#{plugin_id}_csv"
end
@daipom Thanks. I fixed the key name at last commit. |
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
7bad1c1
to
b0b1e88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot that this is a owned
plugin.
It would not be expected to have an owned plugin with PluginId
.
So, could you please fix the following points?
(Owned plugins are unique to the owner's id and usage
.)
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
When it invoke |
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Sorry, I missed this point. @Watson1978 Could you please review this commit? The aim of this commit is to ensure that the compat layer does not affect this cache feature. |
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the CI failures my commit caused!
It seems possible that there is no owner. (FluentBinlogReader)
It would be a good solution to disable the cache feature in such a case.
I commented on some minor points.
Please check them.
Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
2c0611e
to
2591868
Compare
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your contribution!
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Fixes fluent#3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See fluent#3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken fluent#3609 retry_max_times 0 # need to go to secondary as soon as possible fluent#3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
**Which issue(s) this PR fixes**: Backport #4864 Fixes #3627 **What this PR does / why we need it**: Seems `in_exec` plugin creats Thread object each interval and it invoke formatter_csv on the thread. If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC. Here is memory usage of Ruby's process.  ### Reproduce See #3627 ``` <system> process_name fluentd_monitor </system> <source> @type exec tag hardware.memory run_interval 0.1 command /home/watson/prj/sandbox/fluentd/get_memory_metrics.sh <parse> @type json types mem_available:float,swap_total:float,swap_free:float </parse> </source> <match **> @type exec command /home/watson/prj/sandbox/fluentd/psql_memory.sh <format> @type csv force_quotes false delimiter | fields time_local,mem_available,swap_total,swap_free </format> <buffer time> @type file path /home/watson/prj/sandbox/fluentd/log timekey 50s timekey_wait 0s flush_mode lazy total_limit_size 200MB retry_type periodic # exponential backoff is broken #3609 retry_max_times 0 # need to go to secondary as soon as possible #3610 </buffer> </match> ``` **Docs Changes**: **Release Note**: formatter_csv: fix memory leak Signed-off-by: Shizuo Fujita <fujita@clear-code.com> Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com> Co-authored-by: Daijiro Fukuda <fukuda@clear-code.com>
Which issue(s) this PR fixes:
Fixes #3627
What this PR does / why we need it:
Seems
in_exec
plugin creats Thread object each interval and it invoke formatter_csv on the thread.If it use a Thread object as hash key when caching with formatter_csv, that thread will be permanently referenceable and will not be collected by the GC.
Here is memory usage of Ruby's process.

Reproduce
See #3627
Docs Changes:
Release Note:
formatter_csv: fix memory leak