Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka receiver does not render otelcol_kafka_receiver_current_offset correctly #30177

Closed
spadger opened this issue Dec 22, 2023 · 1 comment · Fixed by #30268
Closed

Kafka receiver does not render otelcol_kafka_receiver_current_offset correctly #30177

spadger opened this issue Dec 22, 2023 · 1 comment · Fixed by #30268
Labels
bug Something isn't working needs triage New item requiring triage

Comments

@spadger
Copy link
Contributor

spadger commented Dec 22, 2023

Component(s)

receiver/kafka

What happened?

Description

The Kafka receiver only reports otelcol_kafka_receiver_current_offset metrics for a single partition. e.g. If I have a 10-partition topic, and three collectors, each will render the metric once.

The value reported is correct for the single partition that is being reported on; the rest are just missing

I'm /guessing/ the issues is at kafka_receiver.go->ConsumeClaim(), where a partition tag needs to be introduced, with value=claim.Partition()

Steps to Reproduce

  • Create a multi-partition kafka topic
  • Produce otlp spans, ensuring all partitions contain records
  • Inspect http://:8888/metrics

Expected Result

Each partition has its offset reported, with a partition tag (ideally)

otelcol_kafka_receiver_current_offset{partition=0} 50,001
otelcol_kafka_receiver_current_offset{partition=3} 50,002
otelcol_kafka_receiver_current_offset{partition=6} 50,003
otelcol_kafka_receiver_current_offset{partition=9} 50,004

Actual Result

Only one partition has its offset reported. I omitted some uninteresting tag, but note that name was an empty string

otelcol_kafka_receiver_current_offset{name=""} 50,002

Collector version

0.91.0

Environment information

Environment

OS: linux
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  kafka:
    brokers:
      - some.kafka1:9092
      - some.kafka2:9092
      - some.kafka3:9092
    topic: otlp_spans
    encoding: otlp_proto
    group_id: otel-collector-backend
  
  processors:
    memory_limiter:
      limit_mib: 400
      spike_limit_mib: 60 # 15% of (pod memory - 100mi)
      check_interval: 5s
  exporters:
    otlp:
      endpoint: some-collector.telemetry.svc:4317
      tls:
        insecure: true
  extensions:
    health_check:
      endpoint: 0.0.0.0:13133
  service:
    extensions:
      - health_check
    pipelines:
      traces:
        receivers:
          - kafka
        processors:
          - memory_limiter
        exporters:
          - otlp

Log output

No response

Additional context

No response

@spadger spadger added bug Something isn't working needs triage New item requiring triage labels Dec 22, 2023
@spadger
Copy link
Contributor Author

spadger commented Jan 2, 2024

To add to the fun, I'd never expect the offset to decrease, but look at the graph. I'm guessing this is because the gauge just represents the last committed offset, so if there are e.g. 4 partitions being ready by this collector, we'll see the value fluctuate between 4 smooth lines
image

spadger added a commit to spadger/opentelemetry-collector-contrib that referenced this issue Jan 16, 2024
MovieStoreGuy added a commit that referenced this issue Jan 17, 2024
**Description:**

Fixes
[#30177](#30177)
The Kafka receiver now exposes the following metrics according to
partition. Beforehand, if a collector were consuming from 10 partitions,
one of each metric would be rendered, with its value fluctuating
according to the state of each partition's consumer. Now the metrics
endpoint will expose 10 sets of metrics, each with a `partition` tag.

* kafka_receiver_messages
* kafka_receiver_current_offset
* kafka_receiver_offset_lag

**Testing:**
* Unit tests were run
* Stats endpoint observed manually for correctness
* Scraped stats charted in Prometheus to ensure stability

Example output:
```
otelcol_kafka_receiver_messages{name="",partition="0",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 29
otelcol_kafka_receiver_messages{name="",partition="1",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="10",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="11",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 28
otelcol_kafka_receiver_messages{name="",partition="12",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 36
otelcol_kafka_receiver_messages{name="",partition="13",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 38
```

**Documentation:**
None added

Co-authored-by: Sean Marciniak <30928402+MovieStoreGuy@users.noreply.github.com>
mfyuce pushed a commit to mfyuce/opentelemetry-collector-contrib that referenced this issue Jan 18, 2024
…y#30177) (open-telemetry#30268)

**Description:**

Fixes
[open-telemetry#30177](open-telemetry#30177)
The Kafka receiver now exposes the following metrics according to
partition. Beforehand, if a collector were consuming from 10 partitions,
one of each metric would be rendered, with its value fluctuating
according to the state of each partition's consumer. Now the metrics
endpoint will expose 10 sets of metrics, each with a `partition` tag.

* kafka_receiver_messages
* kafka_receiver_current_offset
* kafka_receiver_offset_lag

**Testing:**
* Unit tests were run
* Stats endpoint observed manually for correctness
* Scraped stats charted in Prometheus to ensure stability

Example output:
```
otelcol_kafka_receiver_messages{name="",partition="0",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 29
otelcol_kafka_receiver_messages{name="",partition="1",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="10",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="11",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 28
otelcol_kafka_receiver_messages{name="",partition="12",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 36
otelcol_kafka_receiver_messages{name="",partition="13",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 38
```

**Documentation:**
None added

Co-authored-by: Sean Marciniak <30928402+MovieStoreGuy@users.noreply.github.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Feb 1, 2024
…y#30177) (open-telemetry#30268)

**Description:**

Fixes
[open-telemetry#30177](open-telemetry#30177)
The Kafka receiver now exposes the following metrics according to
partition. Beforehand, if a collector were consuming from 10 partitions,
one of each metric would be rendered, with its value fluctuating
according to the state of each partition's consumer. Now the metrics
endpoint will expose 10 sets of metrics, each with a `partition` tag.

* kafka_receiver_messages
* kafka_receiver_current_offset
* kafka_receiver_offset_lag

**Testing:**
* Unit tests were run
* Stats endpoint observed manually for correctness
* Scraped stats charted in Prometheus to ensure stability

Example output:
```
otelcol_kafka_receiver_messages{name="",partition="0",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 29
otelcol_kafka_receiver_messages{name="",partition="1",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="10",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 32
otelcol_kafka_receiver_messages{name="",partition="11",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 28
otelcol_kafka_receiver_messages{name="",partition="12",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 36
otelcol_kafka_receiver_messages{name="",partition="13",service_instance_id="db3bae97-1856-4b49-85cb-8559db48f345",service_name="otelcontribcol",service_version="0.91.0-dev"} 38
```

**Documentation:**
None added

Co-authored-by: Sean Marciniak <30928402+MovieStoreGuy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage
Projects
None yet
1 participant