Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][offloaders] Automatically evict Offloaded Ledgers from memory #19783

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

eolivelli
Copy link
Contributor

@eolivelli eolivelli commented Mar 10, 2023

Motivation

ManagedLedgerImpl retains eagerly a cache of all the BookKeeper ReadHandles.
In case of Offloaded ReadHandler there is kind of a memory leak, as each BlobStoreBackedInputStreamImpl retains a DirectBuffer of 1MB, in the case of a topic with terabytes of data and thousands of ledger this leads to Direct OOM errors if something tries to open all the ledgers

Modifications

Add a new background activity that evicts from memory all the Offloaded ReadHandles and release memory.

The feature is controlled by the new configuration option managedLedgerInactiveOffloadedLedgerEvictionTimeSeconds=600

Unfortunately this fix cannot fully prevent a OODM error because there is no global count and limit of the memory retained by such Handles, it allows to mitigate the problem by releasing automatically unused Ledger Handlers.
The default value, 10 minutes, is very conservative, but it should work with real-world ledgers.

The worst case scenario is a topic with tens of thousands of small ledgers with a consumer that reads from the topic from the beginning, in this case the broker will open the ReadHandlers probably more quickly than the eviction process pace.

Verifying this change

This change added tests.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: eolivelli#22

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 10, 2023
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work @eolivelli

@codelipenghui codelipenghui added this to the 3.0.0 milestone Mar 13, 2023
@codelipenghui
Copy link
Contributor

@eolivelli We should start with a proposal for a new behavior that will introduced to Pulsar, and it also introduced a new configuration.

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have invalidateReadHandle method to invalidate the read handles. Why should we introduce a time-based invalidation for the offloaded read handles? Or, if the change wants to introduce time-based read handle invalidation, why it can't apply to the bookkeeper read handle?

@poorbarcode
Copy link
Contributor

Since we will start the RC version of 3.0.0 on 2023-04-11, I will change the label/milestone of PR who have not been merged.

  • The PR of type feature is deferred to 3.1.0
  • The PR of type fix is deferred to 3.0.1

So drag this PR to 3.0.1

@poorbarcode poorbarcode modified the milestones: 3.0.0, 3.1.0 Apr 10, 2023
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@dave2wave
Copy link
Member

@eolivelli Is this something for the near term?

@github-actions github-actions bot removed the Stale label Jul 17, 2023
@Technoboy- Technoboy- modified the milestones: 3.1.0, 3.2.0 Jul 31, 2023
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@Technoboy- Technoboy- added this to the 3.3.0 milestone Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari
Copy link
Member

lhotari commented Oct 10, 2024

@eolivelli @dlg99 Any chance to rebase this PR?

@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
@lhotari lhotari added release/4.0.1 triage/lhotari/important lhotari's triaging label for important issues or PRs labels Oct 14, 2024
@lhotari
Copy link
Member

lhotari commented Oct 16, 2024

I have rebased this PR by merging in origin/master to this PR and resolving the conflicts.

@lhotari
Copy link
Member

lhotari commented Oct 16, 2024

We already have invalidateReadHandle method to invalidate the read handles. Why should we introduce a time-based invalidation for the offloaded read handles? Or, if the change wants to introduce time-based read handle invalidation, why it can't apply to the bookkeeper read handle?

Good point @codelipenghui

@lhotari lhotari requested a review from codelipenghui October 16, 2024 16:30
@lhotari lhotari removed the Stale label Oct 16, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 81.03448% with 11 lines in your changes missing coverage. Please review.

Project coverage is 74.34%. Comparing base (bbc6224) to head (0315714).
Report is 685 commits behind head on master.

Files with missing lines Patch % Lines
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 79.06% 5 Missing and 4 partials ⚠️
...ache/bookkeeper/mledger/OffloadedLedgerHandle.java 0.00% 1 Missing ⚠️
...oad/jcloud/impl/BlobStoreBackedReadHandleImpl.java 83.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #19783      +/-   ##
============================================
+ Coverage     73.57%   74.34%   +0.77%     
- Complexity    32624    34417    +1793     
============================================
  Files          1877     1944      +67     
  Lines        139502   147027    +7525     
  Branches      15299    16201     +902     
============================================
+ Hits         102638   109314    +6676     
- Misses        28908    29286     +378     
- Partials       7956     8427     +471     
Flag Coverage Δ
inttests 27.45% <48.07%> (+2.87%) ⬆️
systests 24.44% <57.69%> (+0.11%) ⬆️
unittests 73.72% <81.03%> (+0.87%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...apache/bookkeeper/mledger/ManagedLedgerConfig.java 96.57% <100.00%> (+0.27%) ⬆️
...org/apache/pulsar/broker/ServiceConfiguration.java 98.98% <100.00%> (-0.42%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 83.89% <100.00%> (+3.11%) ⬆️
...ache/bookkeeper/mledger/OffloadedLedgerHandle.java 0.00% <0.00%> (ø)
...oad/jcloud/impl/BlobStoreBackedReadHandleImpl.java 77.18% <83.33%> (-2.97%) ⬇️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.72% <79.06%> (+1.06%) ⬆️

... and 646 files with indirect coverage changes

void internalTrimLedgers(boolean isTruncate, CompletableFuture<?> promise) {

internalEvictOffloadedLedgers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see internalEvictOffloadedLedgers being called anywhere but tests, could you explain how it will be triggered?

Copy link
Member

@lhotari lhotari Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poorbarcode It gets called when internalTrimLedgers is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs release/3.0.10 release/3.3.4 release/4.0.1 triage/lhotari/important lhotari's triaging label for important issues or PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants