-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated aggregation range works incorrectly when computer sleeps #13403
Comments
Unfortunately, changing that could create periods in time where we skip data during normal operations. For example, there could be a small period where I think what we could do is detect that time.now is more than a config.period away from the until (meaning we are already past the entire period) and update the until to be @srebhan thoughts? |
Hmmm I think we might want to do both. :-) My idea would be to create "bins" from the last configuration window until "now" and sort the metrics in there as during sleep we might have metrics in the pipe that are stuck right before the aggregator... Furthermore, it might be that we read metrics from the past that are lost otherwise, e.g. a tail from a network drive that was filled during sleep or a service endpoint with historic data... |
Is there any progress on this issue ? I have the case, that the telegraf runs on an edge device. This device has not the correct date/time after power on. It takes some time to synchronize the clock by NTP. But that leads to the condition, that telegraf starts with an old date/time and because of this, merging metrics also does not work correctly once the date/time is updated. |
@JSchy65 no I don't think so. Happy to review a pull-request though. |
Probably fixed this issue with the following commit: Unfortunately, for juristical reasons, for my company it is not possible to sign the "Corporate Contributor License Agreement". Therefore I am not allowed to create a pull request. All what I can do is to make the suggestion to use the above fix. |
@Schachi for legal reasons we cannot just copy the code above if you did not sign the CLA. Maybe your company permits that you submit the PR as a private person? |
Hello,
I have a telegraf config with an input and a basicstats aggregator. When the computer resumes from sleep, the aggregator timekeeping is left behind. The consequence is that aggregated stats are dropped as they don't fit in the current aggregation window. The problem is that aggregation window will remain in the past forever.
After setting debug = true in agent config, I find these messages in the logs (note time differences):
Looking in the code, I suspect the issue comes from this line:
https://github.com/influxdata/telegraf/blob/master/models/running_aggregator.go#L174
... as the aggregation window is incremented each time with the period and I see nowhere something to accomodate for time drifts (such as we have during machine sleeps).
I think we should change the code to set until to something like now() + c.Config.Period, so that we accomodate for time drifting in one direction or another.
Thanks for your help on this,
Catalin.
The text was updated successfully, but these errors were encountered: