-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Start Timestamp for Cumulative Metrics in Skipper Metrics #3087
Comments
Please no cache, there is no way how to cleanup and basically it's an unbounded data sync |
AlexanderYastrebov
added a commit
that referenced
this issue
May 30, 2024
Add start label to each counter with the value of counter creation timestamp as unix nanoseconds. This enables OpenTelemetry cumulative temporality, see https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality Example: ``` ~$ curl -s localhost:9911/metrics | grep host_count # HELP skipper_serve_host_count Total number of requests of serving a host. # TYPE skipper_serve_host_count counter skipper_serve_host_count{code="200",host="bar_test",method="GET",start="1717066533598500794"} 1 skipper_serve_host_count{code="200",host="foo_test",method="GET",start="1717066538031805059"} 2 ``` Fixes #3087
AlexanderYastrebov
added a commit
that referenced
this issue
May 30, 2024
Add start label to each counter with the value of counter creation timestamp as unix nanoseconds. This enables OpenTelemetry cumulative temporality, see https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality Example: ``` ~$ curl -s localhost:9911/metrics | grep host_count # HELP skipper_serve_host_count Total number of requests of serving a host. # TYPE skipper_serve_host_count counter skipper_serve_host_count{code="200",host="bar_test",method="GET",start="1717066533598500794"} 1 skipper_serve_host_count{code="200",host="foo_test",method="GET",start="1717066538031805059"} 2 ``` Fixes #3087 Signed-off-by: Alexander Yastrebov <alexander.yastrebov@zalando.de>
AlexanderYastrebov
added a commit
that referenced
this issue
May 31, 2024
Add start label to each counter with the value of counter creation timestamp as unix nanoseconds. This enables OpenTelemetry cumulative temporality, see https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality Example: ``` ~$ curl -s localhost:9911/metrics | grep host_count # HELP skipper_serve_host_count Total number of requests of serving a host. # TYPE skipper_serve_host_count counter skipper_serve_host_count{code="200",host="bar_test",method="GET",start="1717066533598500794"} 1 skipper_serve_host_count{code="200",host="foo_test",method="GET",start="1717066538031805059"} 2 ``` Fixes #3087 Signed-off-by: Alexander Yastrebov <alexander.yastrebov@zalando.de>
AlexanderYastrebov
added a commit
that referenced
this issue
Jun 3, 2024
Add start label to each counter with the value of counter creation timestamp as unix nanoseconds. This enables OpenTelemetry cumulative temporality, see https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality Example: ``` ~$ curl -s localhost:9911/metrics | grep host_count # HELP skipper_serve_host_count Total number of requests of serving a host. # TYPE skipper_serve_host_count counter skipper_serve_host_count{code="200",host="bar_test",method="GET",start="1717066533598500794"} 1 skipper_serve_host_count{code="200",host="foo_test",method="GET",start="1717066538031805059"} 2 ``` Fixes #3087 Signed-off-by: Alexander Yastrebov <alexander.yastrebov@zalando.de>
JanardhanSharma
pushed a commit
to JanardhanSharma/skipper
that referenced
this issue
Jul 19, 2024
Add start label to each counter with the value of counter creation timestamp as unix nanoseconds. This enables OpenTelemetry cumulative temporality, see https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality Example: ``` ~$ curl -s localhost:9911/metrics | grep host_count # HELP skipper_serve_host_count Total number of requests of serving a host. # TYPE skipper_serve_host_count counter skipper_serve_host_count{code="200",host="bar_test",method="GET",start="1717066533598500794"} 1 skipper_serve_host_count{code="200",host="foo_test",method="GET",start="1717066538031805059"} 2 ``` Fixes zalando#3087 Signed-off-by: Alexander Yastrebov <alexander.yastrebov@zalando.de>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
In OpenTelemetry, every metric of cumulative type is required to have a
StartTimestamp
, which records when a metric is first recorded. However, Prometheus exporter doesn't provideStartTimestamp
by default. The absence ofStartTimestamp
leads to inaccuracies that distort the telemetry data.Example 01: When new
StartTimestamp
is less than actualStartTimestamp
, it creates "fake tail" in metric valuesExample 02: When new
StartTimestamp
is more than actualStartTimestamp
, it creates "fake peak" in metric valuesAs you can see from the examples above, incorrect choice of
StartTimestamp
on OpenTelemetry side always leads to distortions in collected metrics data. And withoutStartTimestamp
on Skipper side there is no chance to correctly guessStartTimestamp
on telemetry collector side.Describe the solution you would like
To remove these side effects in telemetry data, we need StartTimestamp to be set for each cumulative metric by Skipper. The value of StartTimestamp for a particular metric must be equal to the UNIX timestamp (in nanoseconds) of when this metric, with the exact same combination of attributes, was first seen by Skipper.
Describe alternatives you've considered (optional)
In our setup we also tested out start timestamp normalization algorithm, introduced by OpenTelemetry in this article. It describes a set of transformations to the cumulative sum metrics that helps handling unknown start timestamp. This approach can be used as a workaround in situations, where getting start timestamp of stream is not possible at all. But like any other hacks, this one has its own disadvantages. During collector restarts metric data will be completely lost:
This is a critical problem, since usually one OpenTelemetry collector is used per cluster / installation, so restart will cause loss of data. This can be definetly avoided in case we have
StartTimestamp
for each cumulative metric on Skipper side.Additional context (optional)
Prometheus Data Model: https://prometheus.io/docs/concepts/metric_types/
OpenTelemetry Data Model: https://opentelemetry.io/docs/specs/otel/metrics/data-model/
Would you like to work on it?
Yes
The text was updated successfully, but these errors were encountered: