Skip to content

[Feature] Include time attribute with selectors #5926

Open
@jsternberg

Description

Background

InfluxQL includes, at least, two different types of function calls: aggregates and selectors. mean(), median(), and percentile() are examples of aggregates because they take an interval and create a new point value that represents the interval. max(), min(), first(), and last() are examples of selectors that have a relevant time field that may be useful to those querying for those points.

What was done previously (<= 0.10.x)

If there was no GROUP BY time(...) in the select, the time for selectors would equal the column that was chosen. If multiple selectors were used, the time would be zero. If there was a GROUP BY time(...) interval, the time was always supposed to be the time associated with the start time of the interval, but I wasn't able to confirm that with 0.10.2.

What will be done in 0.11

In 0.11, selectors and aggregators have been normalized to return zero for the time no matter what. This creates greater parity with the aggregate version and removes the weird behavior where two selectors would return a zero time. The time column will always be the start time of the interval or zero if there is no interval.

This is better because the time column always plays a consistent role and doesn't implicitly change what it means depending on the query. Unfortunately, it means that getting the time of the first point in an interval is no longer possible with any query.

Proposal

We need to make a modification to the syntax so a person can request the time of a selector. One proposal is to have syntax like this:

SELECT max(value) AS max, max(value).time AS max_time FROM cpu

This syntax is clear and describes exactly what the query would do. I think that this syntax may be harder to implement than we like though. The max(value) call will make an iterator that includes the time in it, but with this method we either have to make two iterators for the same selector or try to match these two together in the query. Matching the queries can end up becoming even more difficult since the two are not guaranteed to be next to each other because they would be acting as separate fields in the selector. The benefit of this is it would be possible to only select the time rather than being forced to select both the value and the time by doing something like SELECT max(value).time FROM cpu.

An alternative syntax could also be:

SELECT max(value) AS max WITH time AS max_time FROM cpu`

This syntax reuses the WITH keyword to declare that the aggregate should include the time as an extra column. A slight negative of this approach is it would no longer be possible to get the number of columns that will be output from looking at the length of the fields slice in the select statement, but that should be a negligible problem. It also isn't possible with this proposal to omit the value itself without some additional syntax.

/cc @pauldix for comment.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions