Skip to content

Missing data in aggregates across DST transitions in timezones using InfluxQL #25303

Open
@ag-cordulus

Description

I'm trying to create aggregates of measurements in different timezones and different resolutions. However, I'm seeing some strange results, and having some trouble determining a common rule for calculating the proper offsets of a timezone, its resolution, and time range.

I'm running InfluxDB v2.7.10 through Docker:

 docker run -p 8086:8086 \
  --name influxdb \
  influxdb:2.7.10

Using a database dst_test with a retention policy with an infinite duration, I insert data every 10 minutes from 2021-01-01T00:00:00Z through 2022-07-01T00:00:00Z:

inch -v -time 792000m -start-time '2021-01-01T00:00:00Z' -t 1 -p 79200 -db dst_test -v2 true

I'm using my own timezone Europe/Copenhagen as example, where:

  • In 2021, DST starts at 2021-03-28T02:00:00+01:00 and ends in 2021-10-31T03:00:00+02:00.
  • In 2022, DST starts at 2021-03-27T02:00:00+01:00 and ends in 2021-10-30T03:00:00+02:00.

Now, creating aggregates for every 2 day from 2021-01-24T00:00:00+02:00 to 2022-07-01T00:00:00+02:00 in timezone Europe/Copenhagen, if I use no offset:

> select count(*) from "m0" where time >= '2021-01-24T00:00:00+01:00' and time < '2022-07-01T00:00:00+01:00' group by time(2d) fill(null) tz('Europe/Copenhagen');
name: m0
time                      count_v0
----                      --------
2021-01-23T00:00:00+01:00 144 <- wrong timestamp and bucket is missing one day of data
2021-01-25T00:00:00+01:00 288
...

My query does not align with the preset time boundaries, and therefore it needs an offset. Well, I can tell it's off by one day, as each day has exactly 144 measurements, so I add an offset of one day:

select count(*) from "m0" where time >= '2021-01-24T00:00:00+01:00' and time < '2022-07-01T00:00:00+01:00' group by time(2d, 24h) fill(null) tz('Europe/Copenhagen');
name: m0
time                      count_v0
----                      --------
2021-01-24T00:00:00+01:00 288 <- correct timestamp of first bucket and right amount of data
2021-01-26T00:00:00+01:00 288
...
2021-03-25T00:00:00+01:00 288
2021-03-27T00:00:00+01:00 282 <- enter DST
2021-03-29T00:00:00+02:00 0
2021-03-31T00:00:00+02:00 0
2021-04-02T00:00:00+02:00 0
2021-04-04T00:00:00+02:00 132
2021-04-06T00:00:00+02:00 288
...
2022-03-24T00:00:00+01:00 288
2022-03-26T00:00:00+01:00 282 <- enter DST again
2022-03-28T00:00:00+02:00 0
2022-03-30T00:00:00+02:00 0
2022-04-01T00:00:00+02:00 0
2022-04-03T00:00:00+02:00 132
2022-04-05T00:00:00+02:00 288
2022-04-07T00:00:00+02:00 288

Usually I get nulls instead of zeros through the query API endpoint, but that's not the point. A bunch of data is missing now.

Through some experimentation I figured I can get a correct first DST transition and reduce the amount of missing data by reversing the offset to -24h, however leaving DST results in missing data:

select count(*) from "m0" where time >= '2021-01-24T00:00:00+01:00' and time < '2022-07-01T00:00:00+01:00' group by time(2d, -24h) fill(null) tz('Europe/Copenhagen');
name: m0
time                      count_v0
----                      --------
2021-01-24T00:00:00+01:00 288 <- Timestamp of first bucket is correct
2021-01-26T00:00:00+01:00 288
2021-01-28T00:00:00+01:00 288
...
2021-03-25T00:00:00+01:00 288
2021-03-27T00:00:00+01:00 282 <- Enter DST
2021-03-29T00:00:00+02:00 288
...
2021-10-27T00:00:00+02:00 288
2021-10-29T00:00:00+02:00 288
2021-10-31T00:00:00+02:00 0 <- leave DST
2021-11-02T00:00:00+01:00 0
2021-11-04T00:00:00+01:00 0
2021-11-06T00:00:00+01:00 0
2021-11-08T00:00:00+01:00 282
2021-11-10T00:00:00+01:00 288

I've had some success with a hunch I had, that the offset had to be reversed whenever my offset was greater than the distance between the start of the bucket containing the first DST transition, and the DST transition, but this rule doesn't always hold true, and I end up with buckets containing no data.

As a last resort, I've tried all offsets between -48h to 48h for the given query where all attempts fail due to some bucket(s) missing data or improper contents of the bucket containing a DST transition.

While searching through issues, I stumbled upon a very relevant commit commit which was merged before 2.7.10 was released. I wonder if you @davidby-influx can help provide some insights as you seem to have experience with aggregates across timezones with DST?

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions