-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Disable the Netty recycler #22452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable the Netty recycler #22452
Conversation
Netty plays a lot of games with recycling byte buffers in thread local caches, and using a pooled byte buffer allocator to reduce pressure on the garbage collector. The recycler in particular appears to be fraught with peril. It appears that there are circumstances where the recycler does not recycle quickly enough and can exceed its capacity leading to heap exhaustion and out of memory errors. If you spend a few minutes reading the history of the recycler on the Netty GitHub issues, it appears it has been nothing but a source of trouble, and the project itself has an open issue that proposes disabling by default and possibly even removing the recycler. The pooled byte buffer allocator has problems itself. It sizes the pool based on the number of runtime processors and can indeed grab a very large percentage of the heap (in some cases 50% or more). Additionally, the Netty project continues to struggle with leaks here. We are seeing users struggle with issues in 5.x that I think are largely driven by some of the problems here with Netty. This change proposes to disable the recycler, and to disable the pooled byte buffer allocator. I think that disabling these features will return some of the stablity that these features appear to be losing us. I have done performance testing on my workstation with disabling these and I do not see a difference in performance. I propose that we make this change in master and let some nightly benchmarks run to confirm that there is not a difference in performance. If we are comfortable with the performance changes, I propose backporting this to all active branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - man how much we gotta fiddle with this software after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong checkbox LGTM again
Netty plays a lot of games with recycling byte buffers in thread local caches. The recycler in particular appears to be fraught with peril. It appears that there are circumstances where the recycler does not recycle quickly enough and can exceed its capacity leading to heap exhaustion and out of memory errors. If you spend a few minutes reading the history of the recycler on the Netty GitHub issues, it appears it has been nothing but a source of trouble, and the project itself has an open issue that proposes disabling by default and possibly even removing the recycler. We are seeing users struggle with issues in 5.x that I think are largely driven by some of the problems here with Netty. This change proposes to disable the recycler I think that disabling this feature will return some of the stablity that these features appear to be losing us. Relates #22452
Netty plays a lot of games with recycling byte buffers in thread local caches. The recycler in particular appears to be fraught with peril. It appears that there are circumstances where the recycler does not recycle quickly enough and can exceed its capacity leading to heap exhaustion and out of memory errors. If you spend a few minutes reading the history of the recycler on the Netty GitHub issues, it appears it has been nothing but a source of trouble, and the project itself has an open issue that proposes disabling by default and possibly even removing the recycler. We are seeing users struggle with issues in 5.x that I think are largely driven by some of the problems here with Netty. This change proposes to disable the recycler I think that disabling this feature will return some of the stablity that these features appear to be losing us. Relates #22452
Netty plays a lot of games with recycling byte buffers in thread local caches. The recycler in particular appears to be fraught with peril. It appears that there are circumstances where the recycler does not recycle quickly enough and can exceed its capacity leading to heap exhaustion and out of memory errors. If you spend a few minutes reading the history of the recycler on the Netty GitHub issues, it appears it has been nothing but a source of trouble, and the project itself has an open issue that proposes disabling by default and possibly even removing the recycler. We are seeing users struggle with issues in 5.x that I think are largely driven by some of the problems here with Netty. This change proposes to disable the recycler I think that disabling this feature will return some of the stablity that these features appear to be losing us. Relates #22452
Our nightly benchmarks show a small increase in GC times in a few (but not all) of the tracks; I'm going back out the pooled to unpooled change to isolate if this is the cause and have only backported the Netty recycler change for now (as this is the most critical change for addressing the performance issues). |
This commit reverts switching to the unpooled allocator (for now) to let some benchmarks run to see if this is the source of an increase in GC times. Relates #22452
@jasontedor can you update the version labels on this PR? |
Thank you @ywelsch, sorry for missing that. I've updated the labels. |
GC times on the few benchmarks where it increased went back down after pulling the disabling pooling change. |
I've updated the title and body to reflect that the disabling the pooled allocator has been reverted. |
ES5.1.1,I've added follow to config/jvm.options,and restart es.But it's still have the question.
|
i'm still experiencing the issue described here; catastrophic GC causing node failure with netty4utils logging fatal errors regarding out of memory |
@dude0404 I glanced at your logs. I think that your issue has nothing to do with what was occurring here, I think that your heap is full. You should inspect a heap dump and see why your heap appears to be full. If you have additional questions, I suggest that you open a topic on the forum; Elastic reserves GitHub for verified bug reports and feature requests, not general discussion. |
Even I am facing the same issue, with Dio.netty.recycler.maxCapacityPerThread=0 |
It is not the case that disabling the recycler would eliminate all causes of out of memory errors in the network layer, only that it eliminated those that arose because of issues in Netty, not because, say, too much data is being pumped through Elasticsearch faster than it can be consumed. Thus, if you're still seeing out of memory issues after disabling the recycler, it means that you need to inspect a heap dump and see where your heap is being consumed. |
https://github.com/elastic/elasticsearch/issues/25860 could you tell me how can i do? |
@forestfantacy did you copied one elasticsearch server to other ? |
Netty plays a lot of games with recycling byte buffers in thread local
caches.
The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.
We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.
This change proposes to disable the recycler. I think that disabling
this feature will return some of the stablity that this feature appears
to be losing us.
I have done performance testing on my workstation with disabling this
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.
Relates netty/netty#5904, #22406, #22360, #22189