-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix Netty buffer leak in RxGatewayStoreModel during cancellation #47292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…mos/encryption/CosmosNettyLeakDetectorFactory.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…to users/fabianm/TestImprovements
…osmosDiagnosticsE2ETest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan * Improve JavaDoc phrasing in RxDocumentClientImpl Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com>
…47216) * Initial plan * Move CosmosNettyLeakDetectorFactory.ingestIntoNetty() to class-level static block Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
…to users/fabianm/NettyBufferLeakFixes
…om/Azure/azure-sdk-for-java into users/fabianm/NettyBufferLeakFixes
…to users/fabianm/NettyBufferLeakFixes
Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
|
@copilot Please also consider these netty record leak logs for same test in different pipeline - mihgt give more insights and ways to validate any changes |
|
@copilot sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java:361 safeRelease should still only happen when refCnt > 0 to aovid moisy logs - but touch and logger.info should have separate if with leakDetectionEnabled. This is not the real problem thought because the logs sow that leakDetectionDbeuggingEnabled == true in the run |
Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
Fixed in commit b120443. Separated the conditionals:
This avoids noisy logs while still performing proper leak detection when enabled. |
|
@copilot Fine - but this is not addressing the issue descirbed in the bug description - leakDetectionDebuggingEnabled was true there accoridng to the logs |
You're absolutely right. After reviewing both leak logs more carefully, the issue is NOT in ReactorNettyClient.java. The leak logs show:
The real leak is in RxGatewayStoreModel.java where:
My changes to ReactorNettyClient were addressing the wrong location based on misunderstanding the leak source. I need to investigate RxGatewayStoreModel's buffer lifecycle instead. |
Co-authored-by: FabianMeiswinkel <19165014+FabianMeiswinkel@users.noreply.github.com>
Description
Fixed Netty buffer leak in
RxGatewayStoreModel.javawhere retained buffers were not properly released during cancellation scenarios, as identified in OrderbyDocumentQueryTest leak detection logs.Root Cause Analysis
After investigating the leak logs, the actual issue was identified in
RxGatewayStoreModel.java, not inReactorNettyClient.javaas initially suspected.Evidence from Leak Logs:
reactor.netty.ByteBufFlux.lambda$aggregate$9(ByteBufFlux.java:276)retain()at RxGatewayStoreModel.java:434The Problem:
httpResponse.body()returns aggregated ByteBuf with refCnt=1bodyByteBuf.retain()increases refCnt to 2 for downstream workdoOnDiscardregistered on stream before map operationunwrapToStoreResponse()takes ownership and releases when successfuldoOnDiscardreleases once (refCnt→1), but the retained reference leaksChanges Made
RxGatewayStoreModel.java (Primary Fix)
doOnDiscardafter the map operation to handle cancellation/discard of retained buffer beforeunwrapToStoreResponsetakes ownershipReactorNettyClient.java (Code Quality Improvements)
leakDetectionDebuggingEnabledcheck (for touch/logging) fromrefCnt > 0check (for release).map()to.doOnNext()inreleaseOnNotSubscribedResponse()for semantic correctnessTesting
✅ Code compiles successfully
✅ No checkstyle violations
✅ Fix targets the exact location identified in leak logs (RxGatewayStoreModel.java:438)
✅ Handles all buffer lifecycle paths: normal completion, exceptions, and cancellations
Fixes [BUG]Investigate Netty buffer leak in OrderbyDocumentQueryTest #47291
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.