-
Couldn't load subscription status.
- Fork 1.9k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When using the opentelemetry sink in vector to send metrics derived from logs to an opentelemetry collector, vector repeatedly fails with 400 bad request. These errors appear in the vector agent logs, but the otel collector does not show any related error logs or indications of receiving malformed payloads. As a result, metrics are not processed by the otel collector as expected
Configuration
apiVersion: observability.kaasops.io/v1alpha1
kind: ClusterVectorPipeline
metadata:
name: log-level-metrics-pipeline
spec:
sources:
kubernetes_logs:
type: kubernetes_logs
pod_annotation_fields:
container_image: container_image
container_name: container_name
pod_name: pod_name
pod_namespace: pod_namespace
fingerprint_lines: 1
ignore_older_secs: 600
transforms:
log_level_tagger:
type: remap
inputs:
- kubernetes_logs
source: |
if exists(.message) {
log_message = string!(.message)
log_level = "INFO"
if contains(upcase(log_message), "ERROR") {
log_level = "ERROR"
} else if contains(upcase(log_message), "WARN") {
log_level = "WARN"
} else if contains(upcase(log_message), "DEBUG") {
log_level = "DEBUG"
}
.log_level = log_level
.attributes = {
"log_level": log_level
}
if exists(.pod_name) {
.attributes.pod_name = string!(.pod_name)
} else {
.attributes.pod_name = "unknown_pod"
}
if exists(.pod_namespace) {
.attributes.pod_namespace = string!(.pod_namespace)
} else {
.attributes.pod_namespace = "unknown_namespace"
}
.timestamp = now()
} else {
.log_level = "UNKNOWN"
.attributes = {
"log_level": "UNKNOWN",
"pod_name": "unknown_pod",
"pod_namespace": "unknown_namespace"
}
}
log_to_metric:
type: log_to_metric
inputs:
- log_level_tagger
metrics:
- type: counter
name: log_level_count
field: log_level
tags:
log_level: "{{attributes.log_level}}"
pod_name: "{{attributes.pod_name}}"
pod_namespace: "{{attributes.pod_namespace}}"
sinks:
otel_collector_sink:
type: opentelemetry
inputs:
- log_to_metric
protocol:
type: http
uri: "http://otel-collector.otel:4318/v1/logs"
method: post
encoding:
codec: json
framing:
method: newline_delimited
batch:
max_events: 100
max_bytes: 1048576
timeout_secs: 10
retry:
initial_interval_secs: 1
max_interval_secs: 30
max_retries: 5
healthcheck:
enabled: true
interval_secs: 60
Version
0.43.0
Debug Output
2024-12-18T14:40:25.520217Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:25.520229Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 4 times.
2024-12-18T14:40:25.520231Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=None request_id=456 error_type="request_failed" stage="sending" internal_log_rate_limit=true
2024-12-18T14:40:25.520266Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] has been suppressed 4 times.
2024-12-18T14:40:25.520268Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=456}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=2 reason="Service call failed. No retries or retries exhausted." internal_log_rate_limit=true
2024-12-18T14:40:26.554810Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554830Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] is being suppressed to avoid flooding.
2024-12-18T14:40:26.554840Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=457}: vector_common::internal_event::component_events_dropped: Internal log [Events dropped] is being suppressed to avoid flooding.
2024-12-18T14:40:43.994791Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] has been suppressed 2 times.
2024-12-18T14:40:43.994821Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector::sinks::util::retries: Not retriable; dropping the request. reason="Http status: 400 Bad Request" internal_log_rate_limit=true
2024-12-18T14:40:43.994858Z ERROR sink{component_kind="sink" component_id=log-level-metrics-pipeline-otel_collector_sink component_type=opentelemetry}:request{request_id=459}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 2 times.
Example Data
No response
Additional Context
Both vector and the otel collector are running in a cluster. Even with debug logging enabled on the otel collector, there are no logs showing that it received the payload or encountered any issues. However, when the same payload is sent to the otel collector using a curl request, it is logged and processed correctly
OpenTelemetry collector config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch: {}
memory_limiter:
limit_mib: 1000
spike_limit_mib: 512
check_interval: 5s
extensions:
zpages: {}
exporters:
logging:
loglevel: debug
sampling_initial: 5
sampling_thereafter: 200
prometheus:
endpoint: 0.0.0.0:8889
metric_expiration: 1m
service:
extensions: [zpages]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging, prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging, file]
References
No response