Add StatsD receiver #290

sonofachamp · 2020-06-06T00:39:11Z

Add a receiver that listens for incoming DogStatsD messages and translates them into OT Metrics.

sonofachamp · 2020-06-06T00:39:29Z

I'm interested in picking this up.

sonofachamp · 2020-06-09T17:46:16Z

I have the skeleton of a dogstatsdreceiver plugin put together, but I'm having issues importing the official DataDog agent code as a library. I opened an issue to track it.

@jmacd do you have any thoughts regarding implementation strategy here? Should we look to leverage the official DataDog implementation for the DogStatsD server?

jmacd · 2020-06-09T18:37:48Z

We should talk about what we're trying to achieve, first. I don't know the DD agent code well enough to know if it's meant for re-use in this way.

I'm assuming there are at least two ways this will be used.

Users may want to receive DogStatsD data, convert into the common representation used in the otel-collector, then take advantage of the OTel collector's facilities such as filtering and remapping, before re-exporting data. Whether configured as an agent or a collector, users will send DogStatsD data over UDP or TCP or UNIX sockets. I imagine such a receiver could be built from scratch pretty easily, just reading from sockets and parsing DogStatsD packets.
Users may want to export metric data arriving via DogStatsD or other sources (OpenCensus, OTLP, Prometheus) using the Datadog agent code base, which is ordinarily the thing that receives DogStatsD data and transforms it into the protocol used by DD. I believe that using part or all of the DD agent for this purpose makes sense, but again I'm not too familiar with the code. I wonder if the agent code is somewhat factored to support using only the logic we need, not the entire package.

Which application are you thinking of? I admit being more interested in (1) because it will provide benefit to a larger community of users than (2).

jmacd · 2020-06-09T18:38:50Z

By the way, is there a branch with your skeleton that I could look at? Thanks!

sonofachamp · 2020-06-09T19:01:21Z

Here's the branch. There's nothing useful in there regarding DogStatsD, this was more me getting familiar with OT. It builds and I can enable it via config file, but it doesn't do anything yet 😄

That's a good point about the existing code being built in a reusable way, I'm not familiar enough either yet.

What you outlined in 1 is what I have in mind. Ingest DogStatsD, convert to OT metrics to be further processed by existing OT pipeline functionalities/other plugins. Regarding protocol, should we aim to support all 3 mentioned (UDP, TCP, and Unix socket?). It seems the existing DogStatsD portion of the agent only supports UDP.

jmacd · 2020-06-09T19:06:23Z

I'm glad you're more interested in (1)! I think that DataDog should get involved if they're interested in exporting from otel-collector into their system (@mikezvi take note).

UDP is great. I am familiar with uses of the datadog agent that use UNIX socketpairs, but that can be addressed later if needed (it's trivial). I've seen documentation about statstd-over-TCP but never seen it used with DD.

sonofachamp · 2020-06-09T19:12:21Z

Sounds good, thanks for the help. I'll take a swing at listening on a UDP port and processing incoming data in DogStatsD format out to OT Metrics.

jrcamp · 2020-06-09T19:40:04Z

@sonofachamp I don't think we want to import the entire datadog agent. Either statsd should be separated out into its own library that multiple agents use or we implement from scratch. (btw, can it just be called statsdreceiver but have an option to support dd-style tags?).

A number of others are interested in having statsd support as well. May be easier to have a high level design doc to work out any issues before going into implementation. I'll forward this thread to them.

sonofachamp · 2020-06-09T20:09:10Z

@sonofachamp I don't think we want to import the entire datadog agent. Either statsd should be separated out into its own library that multiple agents use or we implement from scratch.

That makes sense.

Should DogStatsD be supported through a "mode" of a more generic statsdreceiver? It seems DogStatsD brings more than just tag functionality? I'm not sure what implications that has on determining if it should be a modal plugin versus separate plugins. I need to dig more to get a better understanding. I'm interested in others' thoughts on that.

Where should I create a design doc for this?

jrcamp · 2020-06-09T20:28:35Z

Ah, yeah if you plan on doing their whole protocol rather than just a slightly enhanced statsd it probably makes sense to make a dd-specific receiver.

You could start here with what the config file would look like and any other high level decisions that need to be made.

jmacd · 2020-06-09T21:55:19Z

I don't believe anyone is asking for the "Additional" features that DataDog has implemented. I would strongly support a "statsdreceiver" that recognizes DataDog-style tags. I think we should also support the "d"-type statsd message that DataDog has added to indicate a distribution.

@jrcamp What did you mean about the "whole protocol"? We're talking about statds protocols, not the protocol DD uses to report to itself from the agent, I think.

I am less interested in designing and implementing the kind of statsd rewriter that occurs in the https://github.com/prometheus/statsd_exporter.

jmacd · 2020-06-09T21:58:04Z

Related note: I implemented the OTel-Go contrib dogstatsd exporter, which does not take a dependency on the DD-go statsd client. It's a lot faster.

jrcamp · 2020-06-09T23:12:22Z

@jrcamp What did you mean about the "whole protocol"? We're talking about statds protocols, not the protocol DD uses to report to itself from the agent, I think.

That's originally what I thought we were talking about which is why I suggested making it a generic statsd receiver with DD tag support. However @sonofachamp linked to https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent#dive-into-dogstatsd which includes things like events and service checks.

sonofachamp · 2020-06-09T23:35:56Z

We're primarily interested in the Tagging aspects DogStatsD provides, and if we think it's fine to support the tagging functionality under a statsdreceiver then I'm good with that. I was assuming any DogStatsD specific functionality should go under a dogstatsdreceiver, but it sounds like the tagging functionality as well as the "d"-type StatsD messages @jmacd has referenced above are more widely used and can exist under a statsdreceiver.

jmacd · 2020-06-10T16:16:19Z

Yes, I 💯 agree that we can create one receiver that supports both "plain" statsd and "dog" statsd. I'd like to focus on the dogstatsd support first, because transforming plain statsd messages into labeled metrics is a substantial and separate problem.

I'm also keen on writing a new specification, since there isn't one for statsd. I would call it "OpenStatsD", and it would have an option for properly escaped labels (which IMO is a major problem w/ the de-facto syntax given to us by dogstatsd).

So, let's focus on receiving (dog)statsd data and making it possible to re-export that data. If/when DataDog becomes interested in exporting from the collector, they can contribute a DD exporter to the collector.

sonofachamp · 2020-06-15T18:22:39Z

I'm thinking something as simple as:

receivers:
  statsd:
    # By default a UDP listener
    endpoint: "host:port" # default "localhost:8125"

    # The format of the incoming UDP packets
    encoding: "dogstatsd" # no default until "statsd" is supported and that becomes default? Another options could be to make "dogstatsd" be the default

From an implementation standpoint it seems we can simply parse the incoming StatsD messages directly out to the relevant OpenTelemetry metric types. I can't see any immediate value in interpreting the incoming messages as StatsD types and then mapping StatsD types to OpenTelemetry metrics.

From the Metrics spec I see the pre-aggregated types support counters and gauges, so those seem like natural mappings for the StatsD Gauge and Counter types.

Metric Type mappings:

StatsD	OpenTelemetry
Gauge	OT Gauge
Counter	OT Counter
Timer	OT Histogram?
Histogram	OT Histogram
Meter	OT Metric Raw Measurement
Set	OT Metric Raw Measurement
Distribution	OT Metric Raw Measurement

I'm less certain about the Timer, Histogram, Set, Meter, and Distribution types outside of it looks like they will be supported through raw measurements. The DogStatsD docs say Timers are not directly supported, but Histograms are basically an alias for Timers.

I'm digging more into the Measure and Measurement metric types to understand how to properly use them. Do you know of any further examples that might give insight into how they will play into the StatsD mappings?

bsherrod · 2020-06-15T20:52:28Z

Do you think that OT Histogram (https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/metrics/v1/metrics.proto#L295) can be used for Datadog Histograms? There may be some data conversion necessary depending on how the counts and Exemplar.value are interpreted, but it seems like a pretty close fit.

sonofachamp · 2020-06-15T21:10:55Z

Oh yeah, great point @bsherrod, thanks! I'm a bit confused about the relationship between HistogramDataPoint, Bucket, and Exemplar. Is the receiver plugin going to have to be stateful to collect multiple data points, manage counts, and populate the Buckets, or will the receiver plugin in this case just publish HistogramDataPoints one to one with DogStatsD Histogram metrics?

jmacd · 2020-06-16T04:43:22Z

@sonofachamp You're running into some issues that are currently under review in the metrics SIG. I'm working on a document that will address the correct default translation to and from dogstatsd. I think it would be best if you focused for now on the receiver configuration and the basic code path. I will have a document on the topic of standard translation to-and-from both dogstatds and prometheus for the next SIG meeting.

jmacd · 2020-06-16T04:43:32Z

The config stanza above looks great!

jmacd · 2020-06-16T06:05:22Z

reference: open-telemetry/opentelemetry-specification#636

jmacd · 2020-06-18T09:32:01Z

Here it the above-mentioned document: open-telemetry/oteps#118

sonofachamp · 2020-06-24T16:00:20Z

Here's a summary of my takeaways from yesterdays discussion @jmacd, please correct me if I misunderstood anything.

There are a couple of potential routes we could take regarding aggregation:

We can bake aggregation into the receiver plugin and expose some configuration to the user to be able override some aggregation defaults for the supported StatsD metric types. I believe this type of configuration has been discussed in several other places and is regarded as a "Views API" that could be some common configuration block/mechanism for reuse across the collector.
Make the statsdreceiver a simple mapper between incoming StatsD messages and raw OTLP Metric formats (key here is simple 1-1 mappings, no aggregation). We would defer aggregation to a processor (notably not exporters) for aggregating the raw OTLP metrics. These raw OTLP formats are theoretical at this point to my understanding. We will need a way to tell a downstream processor which raw data points should be grouped or not. The processor could have default aggregations applied as well as expose aggregation configuration to users via the similarly mentioned "Views API" from above.

In the short term we can focus the initial development of this plugin to encompass the UDP listening functionality and parsing StatsD messages into the well defined OT metrics types while OT metrics are being further discussed and iterated on.

We've also scoped the initial StatsD supported types down to: Counters (c), Gauges (g), Histograms (h), and Timers (ms).

@bsherrod What are your thoughts as I know you're interested in this plugin.

Related resources:

jmacd · 2020-06-24T16:42:30Z

See also #332
This is mostly related to point (2) above.
There is support for raw data points in the current OTLP protocol, but there are a few ambiguities remaining. We'll discuss this in tomorrows Metrics SIG.

lubingfeng · 2020-09-17T00:38:21Z

@jmacd @bogdandrutu I discussed with the team and also checked other receivers (signalFX), which is similar to StatsD receiver (getting metrics in certain format). We did not see the aggregation.

The long term solution is to leave aggregation in OT Processor, which is intended to handle the aggregation / batch / filter part.
For short term, we are thinking of accumulating metrics type: counter / gauge every second, i.e. provide one data point for each second while leaving timing/histogram as is. We do not want to implement something that will be removed/changed once OT Processor evolves.
Example:

Counter:
Input: 3 data points in one second, 2, 5, 7
Output: 1 data point 14 (=2+5+7)
Gauge
Input: 4 data points in one second, +1, -2, +3, -4, assuming original value is 10.
Output: 1 data point 8 (=10+1-2+3-4)
Want to get your thoughts on this.

lubingfeng · 2020-09-22T17:03:03Z

@jmacd @bogdandrutu any comments on this?

jmacd · 2020-09-28T21:47:19Z

Hi @lubingfeng, sorry for the long delay.

I am interested in helping you unblock this. As you may know, I'm interested in using the OTel-Go SDK to act as a processing pipeline component inside the OTel-Collector. To that end, I drafted a PR that would add "transient descriptor" support to the SDK. This support is not required for the API or SDK specification, but would be an added benefit for users wanting to re-use the OTel-Go SDK for metrics processing.

Here: jmacd/opentelemetry-go#59

The use of the transient Accumulator developed in that PR means that each Statsd data point can be turned into a metric event and that you can simply encode OTLP using the OTel-Go SDK's OTLP exporter to format OTLP, then emit it directly into a collector pipeline by flushing it once per second, for example. This approach could be beneficial in a number of situations.

Also I'd like to establish a timeline for getting this package and the other receivers to use OTLP natively, instead of OpenCensus data points. Using the OTel-Go SDK as I've proposed would help with that point.

@lubingfeng I'd be happy to meet and discuss this.

wyTrivail · 2020-09-29T05:17:19Z

@jmacd Thank you for the helps!

so basically we need to make changes on statsd receiver to do two things, correct me if i'm wrong.

use the otel go sdk otlp exporter to create otlp metric.
batch metrics every one second and then send them to pipeline.

base on this i have several questions if you don't mind.

if i understand correctly the counter/gauge metric will be accumulated within sdk, what's the accumulation interval for that?
do we need to batch histogram metric as well? I believe the counter and gauge metrics will be accumulated but not histogram?
will the histogram metrics be aggregated in sdk?
what's the reason of batching? performance i guess?

jmacd · 2020-09-29T07:25:00Z

use the otel go sdk otlp exporter to create otlp metric.
batch metrics every one second and then send them to pipeline.

Yes, with the "transient descriptor" support that I mentioned.

if i understand correctly the counter/gauge metric will be accumulated within sdk, what's the accumulation interval for that?

This would be determined by the push controller in opentelemetry-go/sdk/metric/controller/push.

do we need to batch histogram metric as well? I believe the counter and gauge metrics will be accumulated but not histogram?

This would be handled automatically by the SDK (as well as for any other built-in aggregators).

will the histogram metrics be aggregated in sdk?

Yes.

what's the reason of batching? performance i guess?

Yes. This seems like it could be a significant win. Ideally this could be done as a general-purpose OTel-Collector pipeline stage, but it seems appropriate to experiment with this approach in one SDK at first. I am most interested in trying out statsd support this way.

lubingfeng · 2020-10-07T00:12:38Z

@jmacd I would like to have a meeting with you this Thursday 10/8 or Friday 10/9 to discuss what's next. Let me know how I can send you the meeting inviete.

Community seem leaning to do data aggregation in processor after OT GA: Issue#1422 Metric Aggregation Processor Proposal
Currently statds receiver converts statsd data type counter / gauge to OTLP type
- need to check if we need to do Histogram / Timer in receiver or rely on processor (for aggregation) as issue#1422 mentioned Accumulator, which is the entry point of OTLP metric events and manages incremental changes of each metric
We have not seen receivers doing aggregation so far. We do not want to do this one in statsd receiver and get it thrown away later on.
- or we just do simple batching for performance consideration as mentioned by @wyTrivail.

…nsfer counter to int only. (#1361) - Add sample rate support for counter If we receive `counterName:10|c|@0.1`, we will transfer the value to `10/0.1 = 100` to the following process - Transfer gauge to double only After discussion, we plan to transfer gauge to double only no matter what we receive `gaugeName:86|g` will be transferred to 86.0 as Double_Gauge only - Transfer counter to int only After discussion, we plan to transfer counter to int only no matter what we receive `counterName:86|c` will be transferred to 86 as Int only **Link to tracking Issue:** - #290 **Testing:** - Added unit tests

tigrannajaryan · 2020-10-26T15:55:34Z

Statsdrevever exists now, closing.

…290) Bumps [github.com/onsi/ginkgo](https://github.com/onsi/ginkgo) from 1.16.4 to 1.16.5. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v1.16.4...v1.16.5) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

jmacd mentioned this issue Jun 23, 2020

Metrics Transform Processor Proposal #332

Closed

jmacd mentioned this issue Jun 24, 2020

Standard system metrics and semantic conventions open-telemetry/oteps#119

Merged

sonofachamp changed the title ~~Add DogStatsD receiver~~ Add StatsD receiver Jul 27, 2020

sonofachamp mentioned this issue Jul 27, 2020

Add statsdreceiver skeleton #566

Merged

tigrannajaryan added this to the Backlog milestone Jul 30, 2020

sonofachamp mentioned this issue Sep 2, 2020

Add parsing for labels and guages in statsdreceiver #903

Merged

This was referenced Oct 21, 2020

Add timer support for statsD receiver #1334

Closed

Add timer support for statsD receiver #1335

Merged

Add sample rate support for counter, transfer gauge to double and tra… #1361

Merged

tigrannajaryan closed this as completed Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StatsD receiver #290

Add StatsD receiver #290

sonofachamp commented Jun 6, 2020

sonofachamp commented Jun 6, 2020

sonofachamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

jmacd commented Jun 9, 2020

sonofachamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

sonofachamp commented Jun 9, 2020 •

edited

Loading

jrcamp commented Jun 9, 2020

sonofachamp commented Jun 9, 2020

jrcamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

jmacd commented Jun 9, 2020

jrcamp commented Jun 9, 2020 •

edited

Loading

sonofachamp commented Jun 9, 2020

jmacd commented Jun 10, 2020 •

edited

Loading

sonofachamp commented Jun 15, 2020 •

edited

Loading

bsherrod commented Jun 15, 2020

sonofachamp commented Jun 15, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 18, 2020

sonofachamp commented Jun 24, 2020

jmacd commented Jun 24, 2020

lubingfeng commented Sep 17, 2020

lubingfeng commented Sep 22, 2020

jmacd commented Sep 28, 2020

wyTrivail commented Sep 29, 2020

jmacd commented Sep 29, 2020

lubingfeng commented Oct 7, 2020 •

edited

Loading

tigrannajaryan commented Oct 26, 2020

Add StatsD receiver #290

Add StatsD receiver #290

Comments

sonofachamp commented Jun 6, 2020

sonofachamp commented Jun 6, 2020

sonofachamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

jmacd commented Jun 9, 2020

sonofachamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

sonofachamp commented Jun 9, 2020 • edited Loading

jrcamp commented Jun 9, 2020

sonofachamp commented Jun 9, 2020

jrcamp commented Jun 9, 2020

jmacd commented Jun 9, 2020

jmacd commented Jun 9, 2020

jrcamp commented Jun 9, 2020 • edited Loading

sonofachamp commented Jun 9, 2020

jmacd commented Jun 10, 2020 • edited Loading

sonofachamp commented Jun 15, 2020 • edited Loading

Metric Type mappings:

bsherrod commented Jun 15, 2020

sonofachamp commented Jun 15, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 16, 2020

jmacd commented Jun 18, 2020

sonofachamp commented Jun 24, 2020

jmacd commented Jun 24, 2020

lubingfeng commented Sep 17, 2020

lubingfeng commented Sep 22, 2020

jmacd commented Sep 28, 2020

wyTrivail commented Sep 29, 2020

jmacd commented Sep 29, 2020

lubingfeng commented Oct 7, 2020 • edited Loading

tigrannajaryan commented Oct 26, 2020

sonofachamp commented Jun 9, 2020 •

edited

Loading

jrcamp commented Jun 9, 2020 •

edited

Loading

jmacd commented Jun 10, 2020 •

edited

Loading

sonofachamp commented Jun 15, 2020 •

edited

Loading

lubingfeng commented Oct 7, 2020 •

edited

Loading