-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filelogreceiver] CPU consumption increases (roughly) linearly with number of files watched #27404
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Performance improvements are always welcome. |
Pinging code owners for receiver/filelog: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I'll see how far I can get with just some flamegraph analysis. |
Thanks @cwegener. Feel free to share any insights you discover here. |
My first instinct is that allocations are escaping to heap. I'll have a look which allocations that might be and if the code can be changed so that the compiler will use stack-based allocation instead. |
Looking at this again after some related discussions, I think I can explain what is going here. Basically, we are remembering N file (roughly 3x what we find in a poll interval). Then, for each of the M files we find in a poll interval, we open it, read a fingerprint from it, and directly compare it against the N files we remember (unless we find a match, in which case we stop short). There is clearly opportunity for improvement here and I think recent refactoring has put us closer to a trie-based solution which would give us much more efficient cross-referencing of files. Let's keep this open until we can make a substantial gain on this front. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hello. Just a small comment/feedback. |
Thank you for the additional feedback @afoninsky. We will take a serious look at this before moving the component to GA. |
The most impactful mitigation I've found is to lengthen the With a value of 200ms my understanding is the system checks all files 5 times per second. With 1000+ files, this creates significant overhead. With
With
The next mitigation will be to reduce the Hope this could help mitigate readers here. |
Component(s)
pkg/stanza, receiver/filelog
What happened?
Description
The CPU consumption of the
filelog
receiver increases (roughly) linearly as the number of files to be watched increases.Steps to Reproduce
i=1; while [ $i -ne 100 ]; do dd if=/dev/urandom bs=1000 count=1 | base64 > /tmp/logs/$i.log; i=$(($i+1)); done
filelogreceiver
->loggingexporter
and send the 100 files through the pipelineExpected Result
Good question.
I think that ultimately, the expected result is that the default configuration of the
filelog
receiver protects the administrator from putting unexpected burden on the Host CPU.Actual Result
With a default configuration, the
filelog
receiver can cause significant increase in CPU utilization.Collector version
v0.86.0
Environment information
Environment
OS: Archlinux x86_64,
Compiler(if manually compiled): N/A. Using binaries from github for testing.
OpenTelemetry Collector configuration
Log output
Additional context
A quick look at pprof shows that most CPU time is mostly spent on GC.
The text was updated successfully, but these errors were encountered: