Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][ml] Fix memory leak due to duplicated RangeCache value retain operations #23955

Merged
merged 4 commits into from
Feb 10, 2025

Conversation

BewareMyPower
Copy link
Contributor

Motivation

#23903 introduces a memory leak issue in RangeCache#removeEntry.

-        Value value = entryWrapper.getValue(key);
+        Value value = getValueMatchingEntry(entry);

Unlike entryWrapper.getValue, getValueMatchingEntry will increase the reference count of entry's value.

Modifications

  • Remove the duplicated retain operation in RangeCache#removeEntry. Add some API notes for the private methods that might increase the reference count of the value
  • Apply the reference count validation for eviction on a RangeCache (RangeCacheTest.customTimeExtraction).

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@BewareMyPower BewareMyPower added type/bug The PR fixed a bug or issue reported a bug release/3.0.9 release/3.3.5 release/4.0.3 labels Feb 10, 2025
@BewareMyPower BewareMyPower self-assigned this Feb 10, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 10, 2025
@BewareMyPower
Copy link
Contributor Author

Though I think catching the IllegalReferenceCountException from value.retain() is not a good practice and error prone, let's just keep the fix simple.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the problem area. However, this isn't currently the correct fix to the problem. Explained in #23955 (comment) . There's also comments in the original code on line 389 and 430 where the extra reference is removed.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, reviewed again and noticed that it's fine.

@lhotari
Copy link
Member

lhotari commented Feb 10, 2025

The removeEntry logic could be simplified further. The logic in this method should have been improved while adding the EntryWrapper in #22814. With EntryWrapper, it's possible to remove the exact entry from the map, which wasn't possible before adding EntryWrapper.

@lhotari
Copy link
Member

lhotari commented Feb 10, 2025

Though I think catching the IllegalReferenceCountException from value.retain() is not a good practice and error prone, let's just keep the fix simple.

This is necessary in this case due to the racy usage of the cache without synchronization.

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 74.23%. Comparing base (bbc6224) to head (9a9370a).
Report is 894 commits behind head on master.

Files with missing lines Patch % Lines
...org/apache/bookkeeper/mledger/util/RangeCache.java 80.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23955      +/-   ##
============================================
+ Coverage     73.57%   74.23%   +0.66%     
+ Complexity    32624    31919     -705     
============================================
  Files          1877     1853      -24     
  Lines        139502   143828    +4326     
  Branches      15299    16344    +1045     
============================================
+ Hits         102638   106773    +4135     
+ Misses        28908    28674     -234     
- Partials       7956     8381     +425     
Flag Coverage Δ
inttests 26.67% <53.33%> (+2.08%) ⬆️
systests 23.25% <53.33%> (-1.08%) ⬇️
unittests 73.77% <80.00%> (+0.93%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...org/apache/bookkeeper/mledger/util/RangeCache.java 80.79% <80.00%> (-14.62%) ⬇️

... and 1041 files with indirect coverage changes

@lhotari
Copy link
Member

lhotari commented Feb 10, 2025

I'll merge this as soon as it completes, since I'm creating a PR to prevent future regressions by enabling Netty Leak detector in Pulsar CI so that any detected leak will make the build fail.

@lhotari lhotari merged commit 20b3b22 into apache:master Feb 10, 2025
52 checks passed
lhotari pushed a commit that referenced this pull request Feb 10, 2025
…perations (#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
lhotari pushed a commit that referenced this pull request Feb 10, 2025
…perations (#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
lhotari pushed a commit that referenced this pull request Feb 10, 2025
…perations (#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
@lhotari
Copy link
Member

lhotari commented Feb 10, 2025

I have created #23956 (WIP, draft) to address future memory leak regressions. It seems that there are quite a few tests where leaks happen currently and some of them are hard to fix. I have added a solution where it's possible to ignore specific test classes.

@BewareMyPower BewareMyPower deleted the bewaremypower/fix-read-limiter branch February 11, 2025 01:46
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Feb 11, 2025
…perations (apache#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
…perations (apache#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 19, 2025
…perations (apache#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
(cherry picked from commit 224320e)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 20, 2025
…perations (apache#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
(cherry picked from commit dafd347)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 24, 2025
…perations (apache#23955)

Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 20b3b22)
(cherry picked from commit 224320e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants