-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
We're running influxdb 0.9.4.1 on Ubuntu 14.04LTS.
We're trying to use DERIVATIVE()
and running up against some issues. The data is an ever increasing nginx query count. Here's an example of our query without the DERIVATIVE
:
> SELECT first(value) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time first
---- -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z 5.3238387e+07
2015-09-25T22:12:00Z 5.3241087e+07
2015-09-25T22:13:00Z 5.3243829e+07
2015-09-25T22:14:00Z 5.3246917e+07
2015-09-25T22:15:00Z 5.3246917e+07
name: curl_json_value
tags: host=server-2-e1d-01
time first
---- -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z 409642
2015-09-25T22:12:00Z 412407
2015-09-25T22:13:00Z 415038
2015-09-25T22:14:00Z 416994
2015-09-25T22:15:00Z 416994
When we add DERIVATIVE()
:
> SELECT derivative(first(value)) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time derivative
---- ----------
2015-09-25T22:10:00Z 0
name: curl_json_value
tags: host=server-2-e1d-01
time derivative
---- ----------
2015-09-25T22:10:00Z 0
I believe this is because the fill(previous) doesn't give a result for the first item [1].
fill(0)
seems to give a result for the first value, but is incorrect in general for DERIVATIVE()
because it gives the wrong values unless the GROUP BY
time value is exactly correct [2]. We're eventually going to use this query in Grafana, which means we don't get to choose the time period specifically.
Since the times sometimes line up, the DERIVATIVE()
query does sometimes return results. These results look ok, except the first and last values are always garbage (large jumps positive or negative). Maybe DERIVATIVE()
should query extra value on either side so it can give back valid answers over the given WHERE
value, or perhaps it should take the values that it was asked for and throw out the first and last value of the result, giving back a little less data than was asked for.
Interestingly, in 0.9.3 (before bug #3718) we didn't need the FIRST()
function inside DERIVATIVE()
and everything seemed to work. So, we might not have been doing things right, but 0.9.4 appeared to break things for us.
From my perspective as a query writer, I think I'd like DERIVATIVE()
for imply fill(previous)
and have fill(previous)
be magic enough to look into prior data so the first value is there. Then a simple query just works.
Or perhaps DERIVATIVE()
shouldn't bail out early when it doesn't have leading or trailing data points, and instead return NULL
or "no data" or whatever until it has a real difference. That would also fix the first output being a huge spike.
I've only just started thinking about all of this, so there are probably even better options than I've come up with so far. :-)
[1] Well, sometimes it doesn't: since the query has now()
in it, sometimes things line up perfectly and the first result does have a value, but it's not consistent and can't be relied on.
[2] Picture data like this: [5,6,7]
on a 5 second interval. fill(0)
on a 1s interval makes the data [5,0,0,0,0,6,0,0,0,0,7,0,0,0,0]
and then taking the derivative gives [5,-5,0,0,0,6,-6,0,0,0,7,-7,0,0,0]
. What I expect is [5,0,0,0,0,1,0,0,0,0,1,0,0,0,0]
or even [1,0,0,0,0,1,0,0,0,0,1,0,0,0,0]
(though that requires looking at data outside what was asked for).