Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka_consumer: expose consumer group lag as internal metric #11231

Open
hackery opened this issue Jun 1, 2022 · 3 comments
Open

kafka_consumer: expose consumer group lag as internal metric #11231

hackery opened this issue Jun 1, 2022 · 3 comments
Labels
area/kafka feature request Requests for new plugin and for new features to existing plugins

Comments

@hackery
Copy link
Contributor

hackery commented Jun 1, 2022

Feature Request

Proposal:

Add the lag of the consumer group specified in [[inputs.kafka_consumer]] into the telegraf [[inputs.internal]] metrics.

Current behavior:

The input can lag with no indication of this exposed.

Desired behavior:

When [[inputs.internal]] is enabled, the plugin adds selfstat items for the consumer group lag (other metrics might also be useful to add at this point). Sample output:

internal_kafka_consumer,instance=xxxx,consumer_group=tg-0,partition=0 current_offset=x,log_end_offset=y,lag=z 1654079199000000000

Use case:

When a kafka consumer drops behind, it can be hard to diagnose. Kafka's own API does not expose consumer group offset metrics (they're stored in the offsets topics) and one might resort to the CLI tools, e.g.

GROUP TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG   CONSUMER-ID                                   HOST      CLIENT-ID
tg-0  metrics-hosepipe 0          4562002220      4562002452      232   Telegraf-41eef470-f8fc-402a-9e1f-41b50ac153ed /1.2.3.4  Telegraf
tg-0  metrics-hosepipe 1          4561999766      4561999985      219   Telegraf-41eef470-f8fc-402a-9e1f-41b50ac153ed /1.2.3.4  Telegraf 

While calls to the above could be wrapped in a script and called from Telegraf, the consumer input itself is in a better position to collect these metrics in context, apply tags etc.

@hackery hackery added the feature request Requests for new plugin and for new features to existing plugins label Jun 1, 2022
@reimda
Copy link
Contributor

reimda commented Jun 6, 2022

Hi @hackery, it sounds like exposing these kafka stats through inputs.internal would be a helpful tool to shed light on kafka behavior. Are you able to put together a PR to add this functionality?

I'm not sure these metrics are available through the kafka consumer library telegraf uses, https://github.com/Shopify/sarama. There is a recent feature request in that project to add more consumer metrics, including lag: IBM/sarama#2235 Are you familiar with sarama enough to confirm whether it can provide the metrics you're interested in?

@hackery
Copy link
Contributor Author

hackery commented Jun 7, 2022

I would love to work on this, although yes, it may need that Sarama work completing first - I shall have a look at whether I could take that on as well.

@sigurd-cp
Copy link

Do you know if there is any progress on this topic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

No branches or pull requests

3 participants