Skip to content

Disable the Netty recycler #22452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2017
Merged

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented Jan 5, 2017

Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler. I think that disabling
this feature will return some of the stablity that this feature appears
to be losing us.

I have done performance testing on my workstation with disabling this
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.

Relates netty/netty#5904, #22406, #22360, #22189

Netty plays a lot of games with recycling byte buffers in thread local
caches, and using a pooled byte buffer allocator to reduce pressure on
the garbage collector.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

The pooled byte buffer allocator has problems itself. It sizes the pool
based on the number of runtime processors and can indeed grab a very
large percentage of the heap (in some cases 50% or more). Additionally,
the Netty project continues to struggle with leaks here.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler, and to disable the pooled
byte buffer allocator. I think that disabling these features will return
some of the stablity that these features appear to be losing us.

I have done performance testing on my workstation with disabling these
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.
@jasontedor jasontedor added :Distributed Coordination/Network Http and internode communication implementations review v6.0.0-alpha1 labels Jan 5, 2017
Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - man how much we gotta fiddle with this software after all...

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong checkbox LGTM again

@jasontedor jasontedor merged commit 9219d66 into elastic:master Jan 6, 2017
jasontedor added a commit that referenced this pull request Jan 10, 2017
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452
jasontedor added a commit that referenced this pull request Jan 10, 2017
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452
jasontedor added a commit that referenced this pull request Jan 10, 2017
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452
@jasontedor
Copy link
Member Author

Our nightly benchmarks show a small increase in GC times in a few (but not all) of the tracks; I'm going back out the pooled to unpooled change to isolate if this is the cause and have only backported the Netty recycler change for now (as this is the most critical change for addressing the performance issues).

jasontedor added a commit that referenced this pull request Jan 10, 2017
This commit reverts switching to the unpooled allocator (for now) to let
some benchmarks run to see if this is the source of an increase in GC
times.

Relates #22452
@jasontedor jasontedor deleted the ttyl-netty-games branch January 10, 2017 19:31
@ywelsch
Copy link
Contributor

ywelsch commented Jan 11, 2017

@jasontedor can you update the version labels on this PR?

@jasontedor
Copy link
Member Author

Thank you @ywelsch, sorry for missing that. I've updated the labels.

@jasontedor
Copy link
Member Author

GC times on the few benchmarks where it increased went back down after pulling the disabling pooling change.

@jasontedor jasontedor changed the title Disable the Netty recycler and pooled allocator Disable the Netty recycler Jan 12, 2017
@jasontedor
Copy link
Member Author

I've updated the title and body to reflect that the disabling the pooled allocator has been reverted.

@duramen
Copy link

duramen commented Jan 18, 2017

ES5.1.1,I've added follow to config/jvm.options,and restart es.But it's still have the question.

Dio.netty.recycler.maxCapacityPerThread=0
Dio.netty.allocator.type=unpooled

@dude0404
Copy link

dude0404 commented Feb 1, 2017

i'm still experiencing the issue described here; catastrophic GC causing node failure with netty4utils logging fatal errors regarding out of memory
i'm running es 5.1.2
log attached
es512nettyerror.txt

@jasontedor
Copy link
Member Author

@dude0404 I glanced at your logs. I think that your issue has nothing to do with what was occurring here, I think that your heap is full. You should inspect a heap dump and see why your heap appears to be full. If you have additional questions, I suggest that you open a topic on the forum; Elastic reserves GitHub for verified bug reports and feature requests, not general discussion.

@mozinrat
Copy link

mozinrat commented Feb 4, 2017

Even I am facing the same issue, with Dio.netty.recycler.maxCapacityPerThread=0
Dio.netty.allocator.type=unpooled on 5.1.1
log attached
es.txt

@jasontedor
Copy link
Member Author

It is not the case that disabling the recycler would eliminate all causes of out of memory errors in the network layer, only that it eliminated those that arose because of issues in Netty, not because, say, too much data is being pumped through Elasticsearch faster than it can be consumed. Thus, if you're still seeing out of memory issues after disabling the recycler, it means that you need to inspect a heap dump and see where your heap is being consumed.

@forestfantacy
Copy link

forestfantacy commented May 2, 2018

https://github.com/elastic/elasticsearch/issues/25860
https://github.com/elastic/elasticsearch/issues/22189
this problem happened in es5.x & netty4,
it can solving by downgrading to netty3,
but es6.2 is no longer support netty3, and netty4 don't work still, even though adding the config
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dio.netty.allocator.type=unpooled

could you tell me how can i do?

@divyanshsinghvi
Copy link

@forestfantacy did you copied one elasticsearch server to other ?
Try deleting the directory data/nodes in both and run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.