-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Optimize seeking by timestamp #22792
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22792 +/- ##
============================================
+ Coverage 73.57% 73.96% +0.39%
+ Complexity 32624 3373 -29251
============================================
Files 1877 1943 +66
Lines 139502 149846 +10344
Branches 15299 16928 +1629
============================================
+ Hits 102638 110840 +8202
- Misses 28908 30387 +1479
- Partials 7956 8619 +663
Flags with carried forward coverage won't be shown. Click here to find out more.
|
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
Can I help merge this PR? |
This PR needs more review |
@dao-jun please rebase |
# Conflicts: # managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java # managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedCursorTest.java
@lhotari Rebased, please help review when you have time |
Closed and reopened to get the change to unblock CI. |
Hello folks, do you have any news about this pull request? |
close reopen to trigger the CI checks |
Fixes #22129
Motivation
Pulsar uses binary search to find the message by timestamp, it will reduce the number of iterations to find the message, and make it more efficient and faster.
Even though the current implementation is correct, and using binary search to speed-up, but it's still not efficient enough.
The current implementation is to scan all the ledgers to find the message by timestamp.
This is a performance bottleneck, especially for large topics with many messages.
Say, if there is a topic which has 1m entries, through the binary search, it will take 20 iterations to find the message.
In some extreme cases, it may lead to a timeout, and the client will not be able to seeking by timestamp.
The motivation of this PR is to optimize the finding message by timestamp, to make it more efficient and faster.
Modifications
Before search entires, calculate the
start
,end
position byLedgerInfo#timestamp
and only search entries in the range to avoid search the entire ledgers.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: