Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write metrics to influx without timestamp #15219

Closed
knollet opened this issue Apr 24, 2024 · 7 comments · Fixed by #15220
Closed

Write metrics to influx without timestamp #15219

knollet opened this issue Apr 24, 2024 · 7 comments · Fixed by #15220
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@knollet
Copy link
Contributor

knollet commented Apr 24, 2024

Use Case

InfluxDB allows for metrics to be sent without a timestamp. It then sets the timestamp itself. This can be a useful feature, but telegraf's outputs.influxdb doesn't allow for that.

The usecase for this we would use is as follows:
We have a "heartbeat" metric for our hosts, and Grafana alerts if select count(*) from ... where ... group by time(5m), hostname yields 0, meaning there was no metric for host hostname in the last 5 minutes.

Let's say for the example's sake there aren't any because of a network outage. Telegraf can't deliver any metrics to influx because of it. But it still generates and buffers metrics.

So Grafana alerts us. So far, so good.

Now the network outage is resolved, telegraf is reaching the influxdb again and delivers all metrics it has buffered up to this point. The gap in the metric stream which led to Grafana alerting us disappears. Looking at a dashboard (visualizing a count(*) group by time(5m), hostname) there is no gap visible.

Say you (as an admin) don't react that fast: You get the alert "no heartbeat for 5 minutes" on your phone, boot up your notebook, login, yaddayadda, and check Grafana 15 minutes in. Influx is already back, the back buffer has been delivered, no gap is visible, you're confused why the alert rule triggered in the first place.

Enter "timestamp dropping metrics": This is the only possible way to timestamp influxdb metrics on delivery rather than on creation in an [[input.*]] or some other predetermined timestamp. The gap would become visible, the heartbeat metric and alerting would work as intended and be comprehensible.

Thank you

Expected behavior

The Feature Request is for the outputs.influxdb to allow for that.

A possible implementation would be, I think, to have a config option preserve_metric_timestamp defaulting to true, preserve_metric_timestamp_tag defaulting to nil and drop_preserve_metric_timestamp_tag defaulting to false.

preserve_metric_timestamp_tag would then name a tag containing either the string true or the string false respectively. If false then the timestamp would be dropped on the metric upon write.

Or perhaps (drop_)?drop_timestamp(_tag)? is better. I don't really care ;)

Actual behavior

You can't drop the timestamp even though Influxdb allows for that and it's an actually useful feature.

Additional info


*    = now
X    = heartbeat metric
x    = heartbeat metric delivered late
.    = missing metric
|  | = alert rule window 

--------------

this is green:

   *
   v
XXXX         (metrics in influxdb)
 | |


--------------

network outage, this is red, alarm is triggered

      *
      v
XXXX...     (metrics in influxdb)
    | |

....XXX     (metrics accumulating in the telegraf output buffer)

--------------

network goes back up, alarm reason is camouflaged


no gap  *
     v  v
XXXXxxxXX   (metrics in influxdb, no gap visible)
      | |
    ^^^
..........  (telegraf output buffer now empty, no metric lost)

--------------


There already is a possible hack to deliver (certain) metrics without a timestamp:
It is possible to use [[outputs.http]] with format csv, and only output one csv-field (so no commas) which one hacks together with starlark in influx line protocol, leaving the timestamp.
This is ugly and I couldn't make it work with quotes (neither " or \\), an "official" way would be greatly appreciated.

@knollet knollet added the feature request Requests for new plugin and for new features to existing plugins label Apr 24, 2024
powersj added a commit to powersj/telegraf that referenced this issue Apr 24, 2024
@powersj
Copy link
Contributor

powersj commented Apr 24, 2024

Hi,

Because the timestamp is technically optional and this would be opt-in, I have put up #15220. Please set the influx_omit_timestamp config option and let me know if this gets what you are after.

The Feature Request is for the outputs.influxdb to allow for that.

It is not the output that adds a timestamp, it is the influx serializer. Data is serialized into valid InfluxDB line protocol with a timestamp. As a result, this option could be used in more than just the influxdb output, but the influxdb v2, file, etc.

This is ugly and I couldn't make it work with quotes (neither " or \), an "official" way would be greatly appreciated.

Adding this option is potentially more ugly for our users as it will no longer allow a user to write multiple datapoints in one request, leading to the very gaps you want. I can already see the "why isn't all my data showing up" support request 😱

@powersj powersj added the waiting for response waiting for response from contributor label Apr 24, 2024
@knollet
Copy link
Contributor Author

knollet commented Apr 26, 2024

Hi,

I tried to use that with an [[outputs.influxdb]] and it seems the output instantiates a serializer but doesn't pass the influx_omit_timestamp-option.
I get the following error:

... file /etc/telegraf/telegraf.d/heartbeat.dev.conf: plugin outputs.influxdb: line 54: configuration specified the fields ["influx_omit_timestamp"], but they were not used. This is either a typo or this config ...

I don't really see how I can configure the serializer from the [[outputs.influxdb]] myself.

Thanks

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 26, 2024
@powersj
Copy link
Contributor

powersj commented Apr 26, 2024

but they were not used. This is either a typo or this config

That doesn't sound like you are using the right artifact from the PR. Can you include the full version?

[agent]
  debug = true
  omit_hostname = true

[[inputs.exec]]
commands = ["echo metric value=42 1234567890123123123"]
data_format = "influx"

[[outputs.file]]
influx_omit_timestamp = true
$ ./telegraf --config config.toml --once
2024-04-26T13:27:11Z I! Loading config: config.toml
2024-04-26T13:27:11Z I! Starting Telegraf 1.31.0-6ba14734 brought to you by InfluxData the makers of InfluxDB
2024-04-26T13:27:11Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 6 secret-stores
2024-04-26T13:27:11Z I! Loaded inputs: exec
2024-04-26T13:27:11Z I! Loaded aggregators: 
2024-04-26T13:27:11Z I! Loaded processors: 
2024-04-26T13:27:11Z I! Loaded secretstores: 
2024-04-26T13:27:11Z I! Loaded outputs: file
2024-04-26T13:27:11Z I! Tags enabled: 
2024-04-26T13:27:11Z D! [agent] Initializing plugins
2024-04-26T13:27:11Z D! [agent] Connecting outputs
2024-04-26T13:27:11Z D! [agent] Attempting connection to [outputs.file]
2024-04-26T13:27:11Z D! [agent] Successfully connected to outputs.file
2024-04-26T13:27:11Z D! [agent] Starting service inputs
2024-04-26T13:27:11Z D! [agent] Stopping service inputs
2024-04-26T13:27:11Z D! [agent] Input channel closed
2024-04-26T13:27:11Z I! [agent] Hang on, flushing any cached metrics before shutdown
metric value=42
2024-04-26T13:27:11Z D! [outputs.file] Wrote batch of 1 metrics in 17.66µs
2024-04-26T13:27:11Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2024-04-26T13:27:11Z I! [agent] Stopping running outputs
2024-04-26T13:27:11Z D! [agent] Stopped Successfully

@powersj powersj added the waiting for response waiting for response from contributor label Apr 26, 2024
@knollet
Copy link
Contributor Author

knollet commented Apr 26, 2024

I did a yum localinstall telegraf-1.31.0~6ba14734-0.x86_64.rpm --nogpgcheck

Which should be the right version.

Your output is not using the [[outputs.influxdb]] but the [[outputs.file]] which forwards the config options to the serializer.
The [[outputs.influxdb]] doesn't do that (only the influx_uint_support as can be seen here:
https://github.com/influxdata/telegraf/blob/master/plugins/outputs/influxdb/influxdb.go#L189 )

In you PR you didn't touch that file.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 26, 2024
@powersj
Copy link
Contributor

powersj commented Apr 26, 2024

I've updated the PR, please give the new artifacts (up in ~30mins) a try.

@powersj powersj added the waiting for response waiting for response from contributor label Apr 26, 2024
powersj added a commit to powersj/telegraf that referenced this issue Apr 30, 2024
@knollet
Copy link
Contributor Author

knollet commented May 10, 2024

I am sorry that I am only writing now.
Tested it, it worked. Thank you.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label May 10, 2024
@powersj
Copy link
Contributor

powersj commented May 10, 2024

No worries, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants