Skip to content

DERIVATIVE() edge conditions #4237

@caldwell

Description

@caldwell

We're running influxdb 0.9.4.1 on Ubuntu 14.04LTS.

We're trying to use DERIVATIVE() and running up against some issues. The data is an ever increasing nginx query count. Here's an example of our query without the DERIVATIVE:

> SELECT first(value) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time            first
----            -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z    5.3238387e+07
2015-09-25T22:12:00Z    5.3241087e+07
2015-09-25T22:13:00Z    5.3243829e+07
2015-09-25T22:14:00Z    5.3246917e+07
2015-09-25T22:15:00Z    5.3246917e+07


name: curl_json_value
tags: host=server-2-e1d-01
time            first
----            -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z    409642
2015-09-25T22:12:00Z    412407
2015-09-25T22:13:00Z    415038
2015-09-25T22:14:00Z    416994
2015-09-25T22:15:00Z    416994

When we add DERIVATIVE():

> SELECT derivative(first(value)) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time            derivative
----            ----------
2015-09-25T22:10:00Z    0


name: curl_json_value
tags: host=server-2-e1d-01
time            derivative
----            ----------
2015-09-25T22:10:00Z    0

I believe this is because the fill(previous) doesn't give a result for the first item [1].

fill(0) seems to give a result for the first value, but is incorrect in general for DERIVATIVE() because it gives the wrong values unless the GROUP BY time value is exactly correct [2]. We're eventually going to use this query in Grafana, which means we don't get to choose the time period specifically.

Since the times sometimes line up, the DERIVATIVE() query does sometimes return results. These results look ok, except the first and last values are always garbage (large jumps positive or negative). Maybe DERIVATIVE() should query extra value on either side so it can give back valid answers over the given WHERE value, or perhaps it should take the values that it was asked for and throw out the first and last value of the result, giving back a little less data than was asked for.

Interestingly, in 0.9.3 (before bug #3718) we didn't need the FIRST() function inside DERIVATIVE() and everything seemed to work. So, we might not have been doing things right, but 0.9.4 appeared to break things for us.

From my perspective as a query writer, I think I'd like DERIVATIVE() for imply fill(previous) and have fill(previous) be magic enough to look into prior data so the first value is there. Then a simple query just works.

Or perhaps DERIVATIVE() shouldn't bail out early when it doesn't have leading or trailing data points, and instead return NULL or "no data" or whatever until it has a real difference. That would also fix the first output being a huge spike.

I've only just started thinking about all of this, so there are probably even better options than I've come up with so far. :-)


[1] Well, sometimes it doesn't: since the query has now() in it, sometimes things line up perfectly and the first result does have a value, but it's not consistent and can't be relied on.

[2] Picture data like this: [5,6,7] on a 5 second interval. fill(0) on a 1s interval makes the data [5,0,0,0,0,6,0,0,0,0,7,0,0,0,0] and then taking the derivative gives [5,-5,0,0,0,6,-6,0,0,0,7,-7,0,0,0]. What I expect is [5,0,0,0,0,1,0,0,0,0,1,0,0,0,0] or even [1,0,0,0,0,1,0,0,0,0,1,0,0,0,0] (though that requires looking at data outside what was asked for).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions