Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StatsD receiver #290

Closed
sonofachamp opened this issue Jun 6, 2020 · 31 comments
Closed

Add StatsD receiver #290

sonofachamp opened this issue Jun 6, 2020 · 31 comments
Milestone

Comments

@sonofachamp
Copy link
Contributor

Add a receiver that listens for incoming DogStatsD messages and translates them into OT Metrics.

@sonofachamp
Copy link
Contributor Author

I'm interested in picking this up.

@sonofachamp
Copy link
Contributor Author

I have the skeleton of a dogstatsdreceiver plugin put together, but I'm having issues importing the official DataDog agent code as a library. I opened an issue to track it.

@jmacd do you have any thoughts regarding implementation strategy here? Should we look to leverage the official DataDog implementation for the DogStatsD server?

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

We should talk about what we're trying to achieve, first. I don't know the DD agent code well enough to know if it's meant for re-use in this way.

I'm assuming there are at least two ways this will be used.

  1. Users may want to receive DogStatsD data, convert into the common representation used in the otel-collector, then take advantage of the OTel collector's facilities such as filtering and remapping, before re-exporting data. Whether configured as an agent or a collector, users will send DogStatsD data over UDP or TCP or UNIX sockets. I imagine such a receiver could be built from scratch pretty easily, just reading from sockets and parsing DogStatsD packets.

  2. Users may want to export metric data arriving via DogStatsD or other sources (OpenCensus, OTLP, Prometheus) using the Datadog agent code base, which is ordinarily the thing that receives DogStatsD data and transforms it into the protocol used by DD. I believe that using part or all of the DD agent for this purpose makes sense, but again I'm not too familiar with the code. I wonder if the agent code is somewhat factored to support using only the logic we need, not the entire package.

Which application are you thinking of? I admit being more interested in (1) because it will provide benefit to a larger community of users than (2).

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

By the way, is there a branch with your skeleton that I could look at? Thanks!

@sonofachamp
Copy link
Contributor Author

Here's the branch. There's nothing useful in there regarding DogStatsD, this was more me getting familiar with OT. It builds and I can enable it via config file, but it doesn't do anything yet 😄

That's a good point about the existing code being built in a reusable way, I'm not familiar enough either yet.

What you outlined in 1 is what I have in mind. Ingest DogStatsD, convert to OT metrics to be further processed by existing OT pipeline functionalities/other plugins. Regarding protocol, should we aim to support all 3 mentioned (UDP, TCP, and Unix socket?). It seems the existing DogStatsD portion of the agent only supports UDP.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

I'm glad you're more interested in (1)! I think that DataDog should get involved if they're interested in exporting from otel-collector into their system (@mikezvi take note).

UDP is great. I am familiar with uses of the datadog agent that use UNIX socketpairs, but that can be addressed later if needed (it's trivial). I've seen documentation about statstd-over-TCP but never seen it used with DD.

@sonofachamp
Copy link
Contributor Author

sonofachamp commented Jun 9, 2020

Sounds good, thanks for the help. I'll take a swing at listening on a UDP port and processing incoming data in DogStatsD format out to OT Metrics.

@jrcamp
Copy link
Contributor

jrcamp commented Jun 9, 2020

@sonofachamp I don't think we want to import the entire datadog agent. Either statsd should be separated out into its own library that multiple agents use or we implement from scratch. (btw, can it just be called statsdreceiver but have an option to support dd-style tags?).

A number of others are interested in having statsd support as well. May be easier to have a high level design doc to work out any issues before going into implementation. I'll forward this thread to them.

@sonofachamp
Copy link
Contributor Author

@sonofachamp I don't think we want to import the entire datadog agent. Either statsd should be separated out into its own library that multiple agents use or we implement from scratch.

That makes sense.

Should DogStatsD be supported through a "mode" of a more generic statsdreceiver? It seems DogStatsD brings more than just tag functionality? I'm not sure what implications that has on determining if it should be a modal plugin versus separate plugins. I need to dig more to get a better understanding. I'm interested in others' thoughts on that.

Where should I create a design doc for this?

@jrcamp
Copy link
Contributor

jrcamp commented Jun 9, 2020

Ah, yeah if you plan on doing their whole protocol rather than just a slightly enhanced statsd it probably makes sense to make a dd-specific receiver.

You could start here with what the config file would look like and any other high level decisions that need to be made.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

I don't believe anyone is asking for the "Additional" features that DataDog has implemented. I would strongly support a "statsdreceiver" that recognizes DataDog-style tags. I think we should also support the "d"-type statsd message that DataDog has added to indicate a distribution.

@jrcamp What did you mean about the "whole protocol"? We're talking about statds protocols, not the protocol DD uses to report to itself from the agent, I think.

I am less interested in designing and implementing the kind of statsd rewriter that occurs in the https://github.com/prometheus/statsd_exporter.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

Related note: I implemented the OTel-Go contrib dogstatsd exporter, which does not take a dependency on the DD-go statsd client. It's a lot faster.

@jrcamp
Copy link
Contributor

jrcamp commented Jun 9, 2020

@jrcamp What did you mean about the "whole protocol"? We're talking about statds protocols, not the protocol DD uses to report to itself from the agent, I think.

That's originally what I thought we were talking about which is why I suggested making it a generic statsd receiver with DD tag support. However @sonofachamp linked to https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent#dive-into-dogstatsd which includes things like events and service checks.

@sonofachamp
Copy link
Contributor Author

We're primarily interested in the Tagging aspects DogStatsD provides, and if we think it's fine to support the tagging functionality under a statsdreceiver then I'm good with that. I was assuming any DogStatsD specific functionality should go under a dogstatsdreceiver, but it sounds like the tagging functionality as well as the "d"-type StatsD messages @jmacd has referenced above are more widely used and can exist under a statsdreceiver.

@jmacd
Copy link
Contributor

jmacd commented Jun 10, 2020

Yes, I 💯 agree that we can create one receiver that supports both "plain" statsd and "dog" statsd. I'd like to focus on the dogstatsd support first, because transforming plain statsd messages into labeled metrics is a substantial and separate problem.

I'm also keen on writing a new specification, since there isn't one for statsd. I would call it "OpenStatsD", and it would have an option for properly escaped labels (which IMO is a major problem w/ the de-facto syntax given to us by dogstatsd).

So, let's focus on receiving (dog)statsd data and making it possible to re-export that data. If/when DataDog becomes interested in exporting from the collector, they can contribute a DD exporter to the collector.

@sonofachamp
Copy link
Contributor Author

sonofachamp commented Jun 15, 2020

I'm thinking something as simple as:

receivers:
  statsd:
    # By default a UDP listener
    endpoint: "host:port" # default "localhost:8125"

    # The format of the incoming UDP packets
    encoding: "dogstatsd" # no default until "statsd" is supported and that becomes default? Another options could be to make "dogstatsd" be the default

From an implementation standpoint it seems we can simply parse the incoming StatsD messages directly out to the relevant OpenTelemetry metric types. I can't see any immediate value in interpreting the incoming messages as StatsD types and then mapping StatsD types to OpenTelemetry metrics.

From the Metrics spec I see the pre-aggregated types support counters and gauges, so those seem like natural mappings for the StatsD Gauge and Counter types.

Metric Type mappings:

StatsD OpenTelemetry
Gauge OT Gauge
Counter OT Counter
Timer OT Histogram?
Histogram OT Histogram
Meter OT Metric Raw Measurement
Set OT Metric Raw Measurement
Distribution OT Metric Raw Measurement

I'm less certain about the Timer, Histogram, Set, Meter, and Distribution types outside of it looks like they will be supported through raw measurements. The DogStatsD docs say Timers are not directly supported, but Histograms are basically an alias for Timers.

I'm digging more into the Measure and Measurement metric types to understand how to properly use them. Do you know of any further examples that might give insight into how they will play into the StatsD mappings?

@bsherrod
Copy link

Do you think that OT Histogram (https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/metrics/v1/metrics.proto#L295) can be used for Datadog Histograms? There may be some data conversion necessary depending on how the counts and Exemplar.value are interpreted, but it seems like a pretty close fit.

@sonofachamp
Copy link
Contributor Author

Oh yeah, great point @bsherrod, thanks! I'm a bit confused about the relationship between HistogramDataPoint, Bucket, and Exemplar. Is the receiver plugin going to have to be stateful to collect multiple data points, manage counts, and populate the Buckets, or will the receiver plugin in this case just publish HistogramDataPoints one to one with DogStatsD Histogram metrics?

@jmacd
Copy link
Contributor

jmacd commented Jun 16, 2020

@sonofachamp You're running into some issues that are currently under review in the metrics SIG. I'm working on a document that will address the correct default translation to and from dogstatsd. I think it would be best if you focused for now on the receiver configuration and the basic code path. I will have a document on the topic of standard translation to-and-from both dogstatds and prometheus for the next SIG meeting.

@jmacd
Copy link
Contributor

jmacd commented Jun 16, 2020

The config stanza above looks great!

@jmacd
Copy link
Contributor

jmacd commented Jun 16, 2020

@jmacd
Copy link
Contributor

jmacd commented Jun 18, 2020

Here it the above-mentioned document: open-telemetry/oteps#118

@sonofachamp
Copy link
Contributor Author

Here's a summary of my takeaways from yesterdays discussion @jmacd, please correct me if I misunderstood anything.

There are a couple of potential routes we could take regarding aggregation:

  1. We can bake aggregation into the receiver plugin and expose some configuration to the user to be able override some aggregation defaults for the supported StatsD metric types. I believe this type of configuration has been discussed in several other places and is regarded as a "Views API" that could be some common configuration block/mechanism for reuse across the collector.
  2. Make the statsdreceiver a simple mapper between incoming StatsD messages and raw OTLP Metric formats (key here is simple 1-1 mappings, no aggregation). We would defer aggregation to a processor (notably not exporters) for aggregating the raw OTLP metrics. These raw OTLP formats are theoretical at this point to my understanding. We will need a way to tell a downstream processor which raw data points should be grouped or not. The processor could have default aggregations applied as well as expose aggregation configuration to users via the similarly mentioned "Views API" from above.

In the short term we can focus the initial development of this plugin to encompass the UDP listening functionality and parsing StatsD messages into the well defined OT metrics types while OT metrics are being further discussed and iterated on.

We've also scoped the initial StatsD supported types down to: Counters (c), Gauges (g), Histograms (h), and Timers (ms).

@bsherrod What are your thoughts as I know you're interested in this plugin.

Related resources:

@jmacd
Copy link
Contributor

jmacd commented Jun 24, 2020

See also #332
This is mostly related to point (2) above.
There is support for raw data points in the current OTLP protocol, but there are a few ambiguities remaining. We'll discuss this in tomorrows Metrics SIG.

@lubingfeng
Copy link

@jmacd @bogdandrutu I discussed with the team and also checked other receivers (signalFX), which is similar to StatsD receiver (getting metrics in certain format). We did not see the aggregation.

  1. The long term solution is to leave aggregation in OT Processor, which is intended to handle the aggregation / batch / filter part.
  2. For short term, we are thinking of accumulating metrics type: counter / gauge every second, i.e. provide one data point for each second while leaving timing/histogram as is. We do not want to implement something that will be removed/changed once OT Processor evolves.
    Example:
  • Counter:
    Input: 3 data points in one second, 2, 5, 7
    Output: 1 data point 14 (=2+5+7)
  • Gauge
    Input: 4 data points in one second, +1, -2, +3, -4, assuming original value is 10.
    Output: 1 data point 8 (=10+1-2+3-4)
    Want to get your thoughts on this.

@lubingfeng
Copy link

@jmacd @bogdandrutu any comments on this?

@jmacd
Copy link
Contributor

jmacd commented Sep 28, 2020

Hi @lubingfeng, sorry for the long delay.

I am interested in helping you unblock this. As you may know, I'm interested in using the OTel-Go SDK to act as a processing pipeline component inside the OTel-Collector. To that end, I drafted a PR that would add "transient descriptor" support to the SDK. This support is not required for the API or SDK specification, but would be an added benefit for users wanting to re-use the OTel-Go SDK for metrics processing.

Here: jmacd/opentelemetry-go#59

The use of the transient Accumulator developed in that PR means that each Statsd data point can be turned into a metric event and that you can simply encode OTLP using the OTel-Go SDK's OTLP exporter to format OTLP, then emit it directly into a collector pipeline by flushing it once per second, for example. This approach could be beneficial in a number of situations.

Also I'd like to establish a timeline for getting this package and the other receivers to use OTLP natively, instead of OpenCensus data points. Using the OTel-Go SDK as I've proposed would help with that point.

@lubingfeng I'd be happy to meet and discuss this.

@wyTrivail
Copy link
Contributor

@jmacd Thank you for the helps!

so basically we need to make changes on statsd receiver to do two things, correct me if i'm wrong.

  1. use the otel go sdk otlp exporter to create otlp metric.
  2. batch metrics every one second and then send them to pipeline.

base on this i have several questions if you don't mind.

  1. if i understand correctly the counter/gauge metric will be accumulated within sdk, what's the accumulation interval for that?

  2. do we need to batch histogram metric as well? I believe the counter and gauge metrics will be accumulated but not histogram?

  3. will the histogram metrics be aggregated in sdk?

  4. what's the reason of batching? performance i guess?

@jmacd
Copy link
Contributor

jmacd commented Sep 29, 2020

use the otel go sdk otlp exporter to create otlp metric.
batch metrics every one second and then send them to pipeline.

Yes, with the "transient descriptor" support that I mentioned.

if i understand correctly the counter/gauge metric will be accumulated within sdk, what's the accumulation interval for that?

This would be determined by the push controller in opentelemetry-go/sdk/metric/controller/push.

do we need to batch histogram metric as well? I believe the counter and gauge metrics will be accumulated but not histogram?

This would be handled automatically by the SDK (as well as for any other built-in aggregators).

will the histogram metrics be aggregated in sdk?

Yes.

what's the reason of batching? performance i guess?

Yes. This seems like it could be a significant win. Ideally this could be done as a general-purpose OTel-Collector pipeline stage, but it seems appropriate to experiment with this approach in one SDK at first. I am most interested in trying out statsd support this way.

@lubingfeng
Copy link

lubingfeng commented Oct 7, 2020

@jmacd I would like to have a meeting with you this Thursday 10/8 or Friday 10/9 to discuss what's next. Let me know how I can send you the meeting inviete.

  1. Community seem leaning to do data aggregation in processor after OT GA: Issue#1422 Metric Aggregation Processor Proposal
  2. Currently statds receiver converts statsd data type counter / gauge to OTLP type
    • need to check if we need to do Histogram / Timer in receiver or rely on processor (for aggregation) as issue#1422 mentioned Accumulator, which is the entry point of OTLP metric events and manages incremental changes of each metric
  3. We have not seen receivers doing aggregation so far. We do not want to do this one in statsd receiver and get it thrown away later on.
    • or we just do simple batching for performance consideration as mentioned by @wyTrivail.

tigrannajaryan pushed a commit that referenced this issue Oct 26, 2020
…nsfer counter to int only. (#1361)

- Add sample rate support for counter
 If we receive `counterName:10|c|@0.1`, we will transfer the value to `10/0.1 = 100` to the following process
- Transfer gauge to double only
After discussion, we plan to transfer gauge to double only no matter what we receive
`gaugeName:86|g` will be transferred to 86.0 as Double_Gauge only
- Transfer counter to int only
After discussion, we plan to transfer counter to int only no matter what we receive
`counterName:86|c` will be transferred to 86 as Int only

**Link to tracking Issue:** 
- #290

**Testing:** 
- Added unit tests
@tigrannajaryan
Copy link
Member

Statsdrevever exists now, closing.

bogdandrutu pushed a commit that referenced this issue May 12, 2022
…290)

Bumps [github.com/onsi/ginkgo](https://github.com/onsi/ginkgo) from 1.16.4 to 1.16.5.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v1.16.4...v1.16.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants