Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1355] fix(client): Netty client will leak when decoding responses #1455

Merged
merged 6 commits into from
Jan 19, 2024

Conversation

rickyma
Copy link
Contributor

@rickyma rickyma commented Jan 15, 2024

What changes were proposed in this pull request?

The current code logic is that the ByteBuf is only released when msg.body() == null. However, when msg.body != null, msg.body().byteBuf() returns a NettyManagedBuffer.EMPTY_BUFFER, and the ByteBuf is not released in this case, resulting in a memory leak issue every time decoding an RPC response from ShuffleServer.
Over time, if the Spark Job runs long enough and there are enough requests, it will eventually cause a significant memory leak on the client side (Spark Executor).
The modifications to the other code are mainly for readability and enhanced protection, and will not cause any side effects.

Why are the changes needed?

To fix the memory leak issue in the Netty client when decoding RPC responses.
For #1359

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Rerun successfully and no more Netty leak logs.

@rickyma
Copy link
Contributor Author

rickyma commented Jan 15, 2024

PTAL. @zuston @jerqi @leixm

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (3180501) 54.19% compared to head (d2d7794) 55.43%.
Report is 2 commits behind head on master.

Files Patch % Lines
...he/uniffle/common/netty/TransportFrameDecoder.java 71.42% 0 Missing and 2 partials ⚠️
...uniffle/common/netty/protocol/ResponseMessage.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1455      +/-   ##
============================================
+ Coverage     54.19%   55.43%   +1.23%     
- Complexity     2775     2800      +25     
============================================
  Files           423      403      -20     
  Lines         24185    21849    -2336     
  Branches       2058     2062       +4     
============================================
- Hits          13108    12112     -996     
+ Misses        10266     8985    -1281     
+ Partials        811      752      -59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jerqi
Copy link
Contributor

jerqi commented Jan 16, 2024

@leixm Could you help me review this pr?
@rickyma Could you add a ut for this pr?

@@ -293,6 +293,9 @@ public void checkProcessedBlockIds() {

@Override
public void close() {
if (sdr != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this modification? We have released readBuffer.

Copy link
Contributor Author

@rickyma rickyma Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not cause any side effects, and it will be more readable? I can revert this if you want.

After my stress test, I found that we need this to avoid potential memory leaks.

try {
msg = Message.decode(curType, frame);
} finally {
boolean shouldRelease =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we release the frame if the msg is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. Although this can barely happen, we still need to fix this. Done.

msg = Message.decode(curType, frame);
} finally {
boolean shouldRelease =
msg != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract a method for this logic? It's easy to test, too.

Copy link
Contributor Author

@rickyma rickyma Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

private static boolean shouldRelease(Message msg) {
    boolean shouldRelease =
        msg == null
            || (msg != null
                && (msg.body() == null
                    || (msg.body().byteBuf() != null
                        && msg.body().byteBuf().readableBytes() == 0)));
    return shouldRelease;
  }

For testing purposes, maybe private -> public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing purposes, package-available is enough. We don't need public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

private static boolean shouldRelease(Message msg) {
    boolean shouldRelease =
        msg == null
            || (msg != null
                && (msg.body() == null
                    || (msg.body().byteBuf() != null
                        && msg.body().byteBuf().readableBytes() == 0)));
    return shouldRelease;
  }

For testing purposes, maybe private -> public?

You can simplify the code like
private static boolean shouldRelease(Message msg) {
if (msg == null) {
return true;
}
if (mgs.body() == null) {
return true;
}
boolean shouldRelease = (msg.body().byteBuf() != null
&& msg.body().byteBuf().readableBytes() == 0)));
return shouldRelease;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also, I remove the private keyword to make it package available.
@jerqi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a UT for this pr?

Copy link
Contributor Author

@rickyma rickyma Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTs added. @jerqi
Also, I have conveniently fixed an issue where a method parameter was missing. The error parameter was forgotten to be passed to the UnsupportedOperationException.
https://github.com/apache/incubator-uniffle/pull/1455/files#diff-312deeac76db5d5c48ab0caf09429615f3719eadc5ef7fbd8936d31b0d8049d3R32

@rickyma rickyma requested a review from jerqi January 16, 2024 03:50
static boolean shouldRelease(Message msg) {
if (msg == null || msg.body() == null) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (msg.body().byteBuf() == null) {
return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL.

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rickyma

@jerqi jerqi merged commit 04d6854 into apache:master Jan 19, 2024
36 checks passed
@rickyma rickyma deleted the issue-1355 branch May 5, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants