-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix Netty ByteBuf leaks in StoreResponse and RetryContextOnDiagnosticTest #47266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Netty ByteBuf leaks in StoreResponse and RetryContextOnDiagnosticTest #47266
Conversation
Change log level from debug to warn and catch Throwable instead of just IOException to make potential ByteBuf leak issues more visible. Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
|
/azp run java - cosmos - ci |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
@copilot - sure that change can be sueful but does not really explain the leak. Looking further into the logs I realized that the actual leak happens in the previous test class RetryContextOnDiagnosticTest - and is just reported later in FullFidelityChangeFeedProcessorTest. I updated the issue descritpion with the correct Netty leak records. Please take another look and see hwetehr you can find and fix the problem. |
I've found the actual leak in |
Changed from Mono.just() to Mono.fromCallable() to defer StoreResponse creation, ensuring ByteBuf lifecycle is properly managed within each subscription rather than eagerly at mock setup time. Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
06425eb
into
users/fabianm/NettyBufferLeakFixes
…ent instance detection during test execution (#47211) * Test changes to add leak detection * Update Configs.java * Update CosmosNettyLeakDetectorFactory.java * Update sdk/cosmos/azure-cosmos-encryption/src/test/java/com/azure/cosmos/encryption/CosmosNettyLeakDetectorFactory.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update CosmosNettyLeakDetectorFactory.java * Fixes * Fixes * Update CosmosNettyLeakDetectorFactory.java * Update RxDocumentClientImpl.java * Fixes * Update CosmosNettyLeakDetectorFactory.java * Fixes * Fixes * Iterating on tests * Fixing build warning * Fixing memory leak * Reverting production changes * Iterating on test tools * Cleaning-up dummy QueryFeedRangeState properly * Update test-resources.json * Update sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/CosmosDiagnosticsE2ETest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [WIP] Fix Netty buffer and RxDocumentClientImpl leaks (#47213) * Initial plan * Improve JavaDoc phrasing in RxDocumentClientImpl Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com> * NITs * Move static block to class level in cosmos-encryption TestSuiteBase (#47216) * Initial plan * Move CosmosNettyLeakDetectorFactory.ingestIntoNetty() to class-level static block Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> * Update SessionTest.java * Update tests.yml * Update CosmosNettyLeakDetectorFactory.java * Test config * Update CosmosNettyLeakDetectorFactory.java * Updating TestNG * Reverting TestNG to 7.9.0 (highest version still supporting Java8) * Switching back to TestNG 7.5.1 * Enabling leak detection in unit tests * Iterating on tests * Update pom.xml * Test changes (#47233) * Update RntbdTransportClientTest.java * Updating netty leak detection system properties * Update CosmosNettyLeakDetectorFactory.java * Test changes * Prod memory leak fixes * Test fixes * Test fixes * Users/fabianm/portfixes (#47252) * Update WebExceptionRetryPolicy.java * Update ThinClientStoreModel.java * Test fixes * Fix Netty ByteBuf leaks in StoreResponse and RetryContextOnDiagnosticTest (#47266) * Initial plan * Improve logging for ByteBufInputStream close failures Change log level from debug to warn and catch Throwable instead of just IOException to make potential ByteBuf leak issues more visible. Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> * Fix ByteBuf leak in RetryContextOnDiagnosticTest Changed from Mono.just() to Mono.fromCallable() to defer StoreResponse creation, ensuring ByteBuf lifecycle is properly managed within each subscription rather than eagerly at mock setup time. Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> * Update EncryptionAsyncApiCrudTest.java * Fix ByteBuf memory leak in TcpServerMock request decoders (#47269) * Initial plan * Fix memory leak in ServerRntbdRequestDecoder and ServerRntbdContextRequestDecoder When overriding channelRead() in ByteToMessageDecoder and bypassing the parent's decode logic by calling context.fireChannelRead() directly, the ByteBuf reference must be explicitly released to prevent memory leaks. Added ReferenceCountUtil.release() in try-finally blocks to properly manage buffer lifecycle. Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> * Replace ReferenceCountUtil.release with safeRelease * Replace ReferenceCountUtil.release with safeRelease --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com> * Update ThinClientStoreModel.java * Fixing test issues * Disable netty leak detection in RetrycontextOnDiagnosticTest * Test and diagnostics improvements * Test fixes and more breadcrumbs * Test fixes * Test fixes * Test fixes * Update OrderbyDocumentQueryTest.java * Test fixes * Reverting too noisy logs * Test fixes * Fixing POMs * React to code review feedback * Update pom.xml * Addresses code review feedback * Update sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/rntbd/RntbdRequestDecoder.java Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Description
Netty leak detector reported ByteBuf leaks in CI pipeline. Investigation revealed two issues:
StoreResponsewere only logged at DEBUG level, making diagnosis impossibleRetryContextOnDiagnosticTestwhere ByteBuf lifecycle was not properly managedChanges
StoreResponse.java
StoreResponseconstructor:IOExceptiontoThrowableto capture Netty'sIllegalReferenceCountExceptionand similar runtime exceptionsdebugtowarnwith explicit message: "Failed to close content stream. This may cause a Netty ByteBuf leak."File:
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResponse.java(lines 77-79)RetryContextOnDiagnosticTest.java (Root Cause Fix)
backoffRetryUtilityExecuteRetry: Changed fromMono.just(new StoreResponse(...))toMono.fromCallable(() -> StoreResponseBuilder.create()...)backoffRetryUtilityExecuteAsync: Applied same fixMono.just()eagerly evaluates during mock setup, causing ByteBuf lifecycle issuesMono.fromCallable()defers StoreResponse creation until subscription, ensuring proper ByteBuf management per-subscriptionStoreResponseBuilderfor cleaner, more maintainable test codeFile:
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/RetryContextOnDiagnosticTest.javaAll SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines
Fixes #47265
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.