Skip to content

Conversation

@varunbharadwaj
Copy link
Contributor

@varunbharadwaj varunbharadwaj commented Oct 14, 2025

Description

Pull-based ingestion today only has stats for time based lag. This is computed based on incoming message timestamp. The time based lag will be 0 if ingestion is paused or there is a write block. This PR adds offset (pointer) based lag which gives back real lag metric irrespective of ingestion status. Note that this metric is only supported for Kafka. Kinesis implementation does not emit this metric.

The offset based lag looks up the latest offset available in the respective Kafka partition and computes lag as the difference between the Kafka end offset and last consumed offset. This is computed periodically once every 10 seconds by default, but is configurable. Setting the interval to 0 disables this metric.

For example, if we have a large batch of messages written in a short window, the offset based lag gives a better view of ingestion catching up. The following example below shows the lag when ingestion is slow.

image

Related Issues

Follow up for #19607. This metric should help detect growing lag when ingestion is paused.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

✅ Gradle check result for 11aedfc: SUCCESS

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 87.50000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.10%. Comparing base (4094287) to head (3180493).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...rch/indices/pollingingest/DefaultStreamPoller.java 84.21% 2 Missing and 1 partial ⚠️
...g/opensearch/cluster/metadata/IngestionSource.java 77.77% 0 Missing and 2 partials ⚠️
...arch/indices/pollingingest/PollingIngestStats.java 75.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19635      +/-   ##
============================================
+ Coverage     73.08%   73.10%   +0.01%     
+ Complexity    70746    70737       -9     
============================================
  Files          5725     5725              
  Lines        323793   323845      +52     
  Branches      46882    46888       +6     
============================================
+ Hits         236659   236733      +74     
+ Misses        68083    67988      -95     
- Partials      19051    19124      +73     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

✅ Gradle check result for 0bbd545: SUCCESS

@github-actions
Copy link
Contributor

❌ Gradle check result for 05e29a6: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 05e29a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for f41b4f9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for f41b4f9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for f41b4f9: SUCCESS

@github-actions
Copy link
Contributor

✅ Gradle check result for d64b3c8: SUCCESS

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
@varunbharadwaj varunbharadwaj force-pushed the vb/offsetlag branch 4 times, most recently from 5498f7f to cc42f52 Compare October 16, 2025 22:32
@github-actions
Copy link
Contributor

❌ Gradle check result for cc42f52: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for 3180493: SUCCESS

@msfroh msfroh merged commit c9a5fab into opensearch-project:main Oct 17, 2025
35 of 36 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Oct 18, 2025
…sed ingestion (opensearch-project#19635)

* include pointer based lag metric for pull-based ingestion
* set -1 kafka lag if errors occur during offset lag computation
* add setting for pointer based lag interval
* update the lag interval setting to timeValue type

---------

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Oct 23, 2025
…sed ingestion (opensearch-project#19635)

* include pointer based lag metric for pull-based ingestion
* set -1 kafka lag if errors occur during offset lag computation
* add setting for pointer based lag interval
* update the lag interval setting to timeValue type

---------

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants