-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statsdreceiver skeleton #566
Conversation
Codecov Report
@@ Coverage Diff @@
## master #566 +/- ##
==========================================
+ Coverage 87.57% 87.68% +0.10%
==========================================
Files 211 217 +6
Lines 11488 11693 +205
==========================================
+ Hits 10061 10253 +192
- Misses 1087 1098 +11
- Partials 340 342 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
||
switch metricType { | ||
case "c": | ||
i, err := strconv.ParseInt(metricValue, 10, 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does statsd restrict counter values to only ints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, I'm not actually sure. I can't find anything that definitively states only ints, so I must have assumed this from all the examples showing ints. I'll expand this to support both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to influxdata/telegraf#556 at least consul emits floats for counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in that scenario it's still being cast to an int64
, so it's not exactly supporting a float counter, but is able to gracefully handle it. Do you think we should do the same here or look to support Point_DoubleValue
metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should support integers and floating points using either the Int64 or Double data point representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. Great work! 💯
I've written a few suggestions, but all of them can be treated as TODOs for a future PR.
receiver/statsdreceiver/README.md
Outdated
General format is: | ||
|
||
`<name>:<value>|<type>|@<sample-rate>|#<tag1-key>:<tag1-value>,<tag2-k/v>` | ||
|
||
### Counter | ||
|
||
`<name>:<value>|c` | ||
|
||
<!-- ### Gauge | ||
|
||
`<name>:<value>|g` | ||
|
||
### Timer/Histogram | ||
|
||
`<name>:<value>|<ms/h>|@<sample-rate>` --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @sample-rate parameter won't have a place to go in the protocol yet, but we can leave a TODO about how to handle it.
These Timer/Histogram events should be translated the same way a ValueRecorder event from the OTel API would be, and we are meant to multiply the Count of this event by 1/sample-rate. In open-telemetry/opentelemetry-proto#159 we have discussed a potential future for OTLP exemplars that include this inverse-of-sample-rate value as well, which we termed sample-count. In other words, I like the idea of sample-rate, but I prefer to think in terms of its inverse; sample-count is a floating-point > 1, whereas sample-rate is a floating-point > 0 and < 1. (It's one less byte when sample-rate is the inverse of an integer, due to the .
, which is a common case for simple probability sampling.)
What interested me the most here was that you omitted @sample-rate from Counter and Gauge events. I believe the @sample-rate
may be used with Counter and Gauge events as well as Timer/Histogram events. In the PR linked above, any raw or exemplar value could include an optional sample-count
> 1 to indicate that it was probabilistically sampled. There are not many (if any) definitive documents about the statsd format across the internet, so this README will be a great resource. (I think I'm asking that you add @sample-rate to the Counter and Gauge syntax above, otherwise to be proven wrong and that there exists a StatsD specification 😀 .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that we should expect @<sample-count>
instead @<sample-rate>
on incoming messages, or when processing incoming @<sample-rate>
to translate it into sample-count? I expect sample-rate on incoming messages currently as that seems to be standard.
I will update the Counter and Gauge examples to include sample rates 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect @<sample-rate>
, I was just explaining my thoughts for how to get this information into OTLP at a later date.
|
||
metricType := parts[1] | ||
|
||
// TODO: add sample rate and tag parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's focus on tag parsing first. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add to this the ability to generate OTel labels from metric name patterns. For example, when monitoring AWS AppMesh (and the underlying Envoy instances), you will encounter metric names like
cluster.cds_egress_ecommerce-demo-mesh_gateway-vn_tcp_8080.upstream_rq_3xx:8|c
Where the metric name is of the form:
cluster.cds_{traffic}_{mesh}_{service}-vn_{port}.{action}
Ideally this would be an OTel metric with a name like cluster_cds_{action}
with the labels being all the other pattern groups.
We have seen this not infrequently with other statsd services as well.
I looked at the Metric Transform Processor proposal but didn't see anything that would let you derive labels from a metric name pattern, but perhaps that should be part of that processor. Either way, I think this type of transformation is essential for avoiding metric name cardinality explosion and better usability of metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see if we can incorporate logic and configuration from https://github.com/prometheus/statsd_exporter for this task. Hopefully we can think of this as a separate processor stage to be combined with the statsdreceiver, and I would still prioritize parsing DogStatsd-style tags.
|
||
switch metricType { | ||
case "c": | ||
i, err := strconv.ParseInt(metricValue, 10, 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should support integers and floating points using either the Int64 or Double data point representation.
@bogdandrutu can we merge this? I've added TODOs for feedback from @jmacd and everything else has been addressed for now I believe. |
It seems the Lint step failed due to
Is there a way to kick off the CircleCI build manually, I don't guess I have permission to do so. |
Follow up from feedback in #566
* Update README.md * Update extension/README.md Co-Authored-By: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Update extension/README.md Co-Authored-By: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Update extension/README.md Co-Authored-By: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Description:
Adding the scaffolding and some basic starting parsing for a
statsdreceiver
plugin. It is functional, but incomplete. I'm not entirely sure what to do about aggregation yet and whether it should be included in the receiver plugin or deferred to a processor plugin, so I've started with raw mappings.Link to tracking Issue:
#290
Testing:
There are some basic unit tests and I did some manual testing writing out ingested metrics to a file locally.
Documentation:
For now I added some basic details in the
README
that outlines the configuration options and supported message formats.