-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collector internal telemetry updates #4867
Collector internal telemetry updates #4867
Conversation
I just came across open-telemetry/opentelemetry-collector#9315 and open-telemetry/opentelemetry-collector#9759 so now I am not sure this is correct. Perhaps it is because I am consuming the metrics on localhost:8888/metrics. Anyway, hopefully a maintainer can clarify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danelson, it's a bit confusing, but these metrics listed in the table will not be prefixed because they're emitted by instrumentation libraries rather than by collector components
| `otelcol_http_client_active_requests` | Number of active HTTP client requests. | Counter | | ||
| `otelcol_http_client_connection_duration` | Measures the duration of the successfully established outbound HTTP connections. | Histogram | | ||
| `otelcol_http_client_open_connections` | Number of outbound HTTP connections that are active or idle on the client. | Counter | | ||
| `otelcol_http_client_request_body_size` | Measures the size of HTTP client request bodies. | Histogram | | ||
| `otelcol_http_client_request_duration` | Measures the duration of HTTP client requests. | Histogram | | ||
| `otelcol_http_client_response_body_size` | Measures the size of HTTP client response bodies. | Histogram | | ||
| `otelcol_http_server_active_requests` | Number of active HTTP server requests. | Counter | | ||
| `otelcol_http_server_request_body_size` | Measures the size of HTTP server request bodies. | Histogram | | ||
| `otelcol_http_server_request_duration` | Measures the duration of HTTP server requests. | Histogram | | ||
| `otelcol_http_server_response_body_size` | Measures the size of HTTP server response bodies. | Histogram | | ||
| `otelcol_rpc_client_duration` | Measures the duration of outbound RPC. | Histogram | | ||
| `otelcol_rpc_client_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | | ||
| `otelcol_rpc_client_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | | ||
| `otelcol_rpc_client_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | | ||
| `otelcol_rpc_client_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | | ||
| `otelcol_rpc_server_duration` | Measures the duration of inbound RPC. | Histogram | | ||
| `otelcol_rpc_server_request_size` | Measures the size of RPC request messages (uncompressed). | Histogram | | ||
| `otelcol_rpc_server_requests_per_rpc` | Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs. | Histogram | | ||
| `otelcol_rpc_server_response_size` | Measures the size of RPC response messages (uncompressed). | Histogram | | ||
| `otelcol_rpc_server_responses_per_rpc` | Measures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs. | Histogram | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics are generated by the underlying instrumentation library, not by collector components. This means that the prefix will not be present here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codeboten are you saying to revert this? Also can you clarify how this affects consuming the data when scraping the metrics with the prometheus receiver? I am running 0.104.0 with the below config and I see metrics with names like otelcol_http_server_response_size
(note it is not response_body_size
either)
OTel config
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
prometheus/collector:
config:
scrape_configs:
- job_name: "internal"
scrape_interval: 10s
static_configs:
- targets:
- "localhost:8888"
processors:
filter/collector:
error_mode: ignore
metrics:
include:
match_type: regexp
metric_names:
- .*http_server.*
exporters:
debug:
verbosity: detailed
service:
telemetry:
metrics:
level: detailed
pipelines:
logs:
receivers: [otlp]
exporters: [debug]
metrics:
receivers: [prometheus/collector]
processors: [filter/collector]
exporters: [debug]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, until v0.106.1, the prometheus exporter configured inside the collector was setting a prometheus namespace of otelcol
to prefix all metrics exported via prometheus. This was inconsistent with the metrics exported via other exporters (OTLP, console). This was addressed by prefixing all collector component generated metrics manually with otelcol_
to provide a consistent metric name for all exporters. Note that when I'm using the term exporters here, I mean the exporters configured inside the Collector for the use of the OTel Go SDK.
This means that all metrics generated by instrumentation libraries will match the names that these instrumentation libraries intended as per the example below, where http_server_response_size
used to be prefixed by otelcol_
and will now look like this:
http_server_response_size{http_method="POST",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="4318",net_protocol_name="http",net_protocol_version="1.1",service_instance_id="aa3d8988-fdf1-4023-8fff-193877983817",service_name="otelcontribcol",service_version="0.106.1-dev"} 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. The disconnect I was having was between collector generated and instrumentation library generated metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes, along with Alex's note. Thanks!
@danelson please take a look at the requested changes, thanks! |
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
I think this should be good now. Thank you for the feedback. |
@codeboten PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danelson!
/fix:format |
You triggered fix:format action run at https://github.com/open-telemetry/opentelemetry.io/actions/runs/10726531500 |
IMPORTANT: (RE-)RUN
|
/fix:all |
You triggered fix:all action run at https://github.com/open-telemetry/opentelemetry.io/actions/runs/10726991011 |
IMPORTANT: (RE-)RUN
|
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com> Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com> Co-authored-by: Phillip Carter <pcarter@fastmail.com>
Internal collector detailed metrics were missing theotelcol_
prefix.log_records
were missing from the critical monitoring sections (I assume this is not by design)