Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [outputs.datadog] to support submitting metrics alongside Datadog Agents #13561

Closed
jdheyburn opened this issue Jul 6, 2023 · 8 comments · Fixed by #15702
Closed

feat: [outputs.datadog] to support submitting metrics alongside Datadog Agents #13561

jdheyburn opened this issue Jul 6, 2023 · 8 comments · Fixed by #15702
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@jdheyburn
Copy link
Contributor

jdheyburn commented Jul 6, 2023

Use Case

Some deployments of ours write to Datadog agents (via DogStatsD), others to telegraf (via inputs.statsd).

We would like the telegraf agent to be able to submit metrics to Datadog in a format that Datadog agent does already. Currently due to the way the telegraf output plugin for Datadog is implemented, they cannot run side-by-side.

@jrimmer-housecallpro Had already done some work in #10979 to do this, but stated in there that:

Note that this behavior does not play
super-well if running simultaneously with current Datadog agents; they
will attempt to change to Rate with interval=10. We prefer this
method, however, as it reflects the raw data more accurately.

Expected behavior

N/A

Actual behavior

N/A

Additional info

I think it would it make sense to allow some flag in the datadog output config that would allow it to play nicely by:

  • Allow interval to be configurable (or perhaps default to 10)
  • Calculate the value of the metric from the interval
  • Set metric type to be a rate

Example pseudo (ish) code:

switch m.Type() {
  case telegraf.Counter:
    if d.ShouldPlayNicelyWithDatadogAgent() {
      interval = 10 // could pull this from config
      dogM[1] = dogM[1] / interval
      tname = "rate"
    } else {
      // this is the status quo
      interval = 1
      // no change to dogM
      tname = "count"
    }
  case telegraf.Gauge:
    tname = "gauge"
  default:
    tname = ""
}
metric := &Metric{
  Metric: dname,
  Tags: metricTags,
  Host: host,
  Type: tname,
  Interval: interval,
}
metric.Points[0] = dogM
@jdheyburn jdheyburn added the feature request Requests for new plugin and for new features to existing plugins label Jul 6, 2023
@srebhan
Copy link
Member

srebhan commented Jul 10, 2023

@jdheyburn given that you already show some code, would you be so kind to submit a PR?!?

@jdheyburn
Copy link
Contributor Author

@srebhan Yes of course, I am just in the process of testing locally before I raise a PR for the same - assuming that other folks agree that this would be a useful addition 👍🏻

@techministrator
Copy link

Hi @jdheyburn, this is definitely useful. May I know have we got any progress of creating the PR for this yet?

@jdheyburn
Copy link
Contributor Author

@techministrator I did some testing using a local change and had pushed it out to some of our environments, but I reverted as I had seen some strange sawtooth effect on the graphs in Datadog (see attached).

image

I've not had much time to dive into it further. I should have some cycles come October.

@jdheyburn
Copy link
Contributor Author

To close the loop, I won't be able to contribute to this. We are migrating off telegraf for this particular use case.

@powersj
Copy link
Contributor

powersj commented Jan 4, 2024

@techministrator,

If we continued to work on this would you be able to test a change?

@jdheyburn,

Thanks for following up. Did you have anything you could contribute?

@powersj powersj added the waiting for response waiting for response from contributor label Jan 4, 2024
@jdheyburn
Copy link
Contributor Author

Sorry for the double U-turn, I'm going to be re-exploring this again.

I missed in my original notes that the sawtooth effect was fixed by adjusting the metric to be a rate as opposed to a count.

Original screenshot from my message on 2023-08-23 show the metric as_count()
image2023-7-11_17-2-13

Same metric, set as_rate().
image2023-7-11_17-2-47

I'll look to share some code snippet as I get there.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jan 16, 2024
@jdheyburn
Copy link
Contributor Author

I'm starting to think that the above sawtooth is a bi-product of the metric being incorrectly displayed as_count(). Given that this is a rate metric, it should be displayed as such.

In either case, when you zoom out to a larger timeframe, Datadog aggregates (rolls up) the datapoints anyway, so switching between the two is indifferent.

As count:

image-2024-1-16_19-9-23

As rate:
image-2024-1-16_19-9-46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants