-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Relevant telegraf.conf:
interval = "1m"
round_interval = true
flush_interval = "1m"
System info:
CentOS 7
Telegraf 1.9.3 (git: HEAD 6ad8c8b)
Telegraf 1.9.1 (git: HEAD 2063609) (on a second VM)
InfluxDB shell version: 1.7.3
Steps to reproduce:
- Make sure Telegraf is running and feeding data to InfluxDB.
- Check current system time with
date
- Change the time to a time with a different
second
mark. For examplesudo date +%T -s "17:55:32"
(make sure current time is not close to being 32 to avoid not changing the second mark) - Wait for your interval and flush interval to cycle once.
- Enter InfluxDB through the CLI and use the appropriate database.
precision rfc3339
- Make a simple query like
select pid from procstat where time > now() - 5m
Expected behavior:
2019-01-23T16:28:00Z 4944
2019-01-23T16:28:00Z 2796
2019-01-23T16:28:00Z 2115
2019-01-23T16:28:00Z 2169
2019-01-23T17:56:00Z 2115
Actual behavior:
2019-01-23T16:28:00Z 4944
2019-01-23T16:28:00Z 2796
2019-01-23T16:28:00Z 2115
2019-01-23T16:28:00Z 2169
2019-01-23T17:56:16Z 2115
Additional info:
Restarting Telegraf fixes this issue. I found this because I was running some VMs on a computer which was put on power saving mode during lunchtime, and the VMs clocks desynchronized. Using chronyc makestep
produced this unexpected behavior.
This is probably the root cause of other similar issues, because if a system clock desynchronizes itself from official time and is later updated by chronyd, Telegraf will start recording metrics with the wrong timestamps.
One problem this causes is making a query which uses group by time between 2019-01-23T17:50:00Z
and 2019-01-23T17:56:00Z
with fill(0)
can give you false nulls if there is other data with correct timestamps.