-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filelog receiver not reading logs from new files automatically using the poll interval #34395
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I'm curious how you concluded that this is related to the poll interval. I can't see how that would be the case. The poll interval just defines how often we look for files. If you use any reasonable value then it's effectively just checking for the files repeatedly. If it can't find them, that's because they don't exist or because the collector doesn't have access to them (in which case they might as well not exist as far as the collector is concerned). |
Not a conclusion but I assumed it might be an issue there since that was something I was able to find in docs, could be something else altogether. |
@pr0PM Based on the config it appears there is no storage extension involved. Can you confirm that it the case? |
@djaglowski that's right no storage extensions. Would using any help us here? I've never tried any of them. |
Using a storage extension is helpful in many cases but for the sake of diagnosing this issue it would only complicate things. Given that there is no extension, you can be sure there is no state shared between the collectors. Therefore, I think you should look at this issue from the perspective of one node that is not behaving as expected. If I understand correctly, that means for a "dormant" node:
IMO, the behavior you described is very unlikely to be caused by this sequence of events. Much more likely, there is some incorrect assumption about what is actually happening. If possible, I would consider simplifying this to a 1 node cluster until you get it working. I don't see any reason why having multiple nodes can explain any behavior here unless you're misunderstanding which pods are being deployed to which nodes. |
Trying to capture the process in this image for clarity to explain what I meant by dormant node and answer the questions better. Here collector is deployed on k8s as a Daemonset so each node will be running a pod of collector by default. First box:
Second box:
Third Box:
Now for the last 3 questions:
yes it is as explained above
yes, I even ssh'd into the node to confirm this
yes k8s daemonset makes sure it's deployed on each node |
Thanks for the diagram and detailed answers. Can you try enabling debug logging for the collector and sharing a more complete log?
|
There were too many logs here so redacted the most redundant parts, please let me know if I should share more detailed ones. node-1 while target workload is on it
node-1 after removal of the target service
node-2 it logs the same thing continuously since startup nothing new here even when the new workload pod gets scheduled here (2nd box)
node-2 after restart and logging working again (3rd box)
|
I am unable to reproduce this on a single node.
|
@djaglowski can you try reproducing this with a stateful set if possible, This is where we saw the issue happening. |
@djaglowski the above keycloak pod works as stateful set and not deployment |
@rpsadarangani I tested with a stateful set and get the same result. Not sure if related, but do you know why your logs show a malformed list of paths, but only when the list is empty? ( |
Since I'm unable to reproduce the issue and cannot come up with any theory that explains the described behavior, I'll have to reiterate my request that you reduce the complexity of the scenario. If you can provide a concrete set of k8s specs and commands that demonstrate the issue, I can look into it further. |
The problem here is that I am using the following in the
which is breaking as described above, while we need to use following and all problems disappear
The difference being the extra |
Component(s)
receiver/filelog
What happened?
Description
The filelog receiver is not picking up matching log files unless restarted if new pods with matching pattern get scheduled on the k8s node.
I'll try to explain this with a detailed example:
Let's say I have 5 node cluster while my filelog config is matching files corresponding to a deployment with 3 replicas (each in unique node). In this case 3 pods of OTEL collector will start the processing the logs. 2 OTEL collector pods will be idle.
Let's say there was a rollout restart for the target deployment and 2 pods got scheduled on new nodes where OTEL pods were idle.
In my case the idle OTEL pods don't pick up the logs from the file and stay dormant.
I read the filelog receiver has a config for poll_interval which doesn’t seem to be working here in our case.
When the otelcol starts reading logs from a file it logs look something like this if the matching pods are present in the node:
and for idle otecol daemonset pods it looks something like this:
Now if the pods switch the node to a node where OTEL collector pod was in dormant state otecol pod doesn't start the processing.
Steps to Reproduce
no files match the configured criteria
in that case)Expected Result
OTEL collector pod should be polling for log files and pick up any new files as they arrive.
Actual Result
OTEL collector doesn't start processing the log files automatically as they arrive unless I forcefully restart the OTEL collector pod in the node where previously no matching log files were present.
Collector version
v0.101.0
Environment information
Environment
OS: Amazon Linux 2
EKS v1.26.12-eks Node
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: