-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 #3259
Conversation
💔 -1 overall
This message was automatically generated. |
.../hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/resources/log4j.properties
Outdated
Show resolved
Hide resolved
...reduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/LoggingHttpResponseEncoder.java
Outdated
Show resolved
Hide resolved
...reduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/LoggingHttpResponseEncoder.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
148287b
to
7922573
Compare
💔 -1 overall
This message was automatically generated. |
Is there any update on this PR? @szilard-nemeth We are also waiting for the change of using netty4 to replace netty3 due to its vulnerability issues. |
Hi @jasonwzs, |
Thanks @szilard-nemeth . Will there be any issue if we backport this change to 3.2.x? |
7922573
to
6ba9c41
Compare
Hopefully not but we need to take care of this. |
Hi @jasonwzs, |
💔 -1 overall
This message was automatically generated. |
Hi @szilard-nemeth , I referred PR provided in this jira for netty upgrade for fixing Netty CVE vulnerability and did some fixes for maven shading issues. I am happy to contribute the patch to open source. Can i attach patch in this PR or open a new PR. Let me know your thoughts. Thanks |
Hi @ayushpal24, |
💔 -1 overall
This message was automatically generated. |
d244fcf
to
31287c0
Compare
Hi @ayushpal24, |
Hi @szilard-nemeth , some imports got auto reformatted in previous commit , so i amended the changes in that commit. I have verified the changes are not significant .I will check for Maven shading issue after the current build . |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi @szilard-nemeth ,this is my first patch and i am not able to find any detail report to understand the reason of failures, any suggestions? |
Hi, |
💔 -1 overall
This message was automatically generated. |
31287c0
to
7f43fd6
Compare
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and the huge effort on it @szilard-nemeth. I had a few suggestions/questions in the first round of my review.
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Outdated
Show resolved
Hide resolved
...doop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
private final ChannelGroup accepted = | ||
new DefaultChannelGroup(GlobalEventExecutor.INSTANCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually threadsafe according to the documentation. However, GlobalEventExecutor instance is a non-scalable option, perhaps a custom executor will be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, please check again.
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Show resolved
Hide resolved
...t/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
… 5 thread instance of DefaultEventExecutorGroup
f6c3739
to
6f36b38
Compare
💔 -1 overall
This message was automatically generated. |
Merged to trunk. Thanks @szilard-nemeth for the patch and @shuzirra @9uapaw @K0K0V0K @ashutoshcipher for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for working on this. It's not an easy feat to pull off. I do have a few questions& comments. If it passes unit tests and has used netty's memory leak detection tool, it should be okay to land it in the trunk branch for 3.4.0 release.
LOG.debug("Fetcher " + id + " going to fetch from " + host + " for: " | ||
+ maps); | ||
if (LOG.isDebugEnabled()) { | ||
LOG.debug("Fetcher " + id + " going to fetch from " + host + " for: " + maps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slf4j logger messages can be rewritten using parameterized logging format. But let's not worry about that now. This PR is already too big.
import static io.netty.handler.codec.http.HttpResponseStatus.OK; | ||
import static io.netty.handler.codec.http.HttpResponseStatus.UNAUTHORIZED; | ||
import static io.netty.handler.codec.http.HttpVersion.HTTP_1_1; | ||
import static org.apache.hadoop.mapred.ShuffleHandler.NettyChannelHelper.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not use wildcard import
private ServerBootstrap bootstrap; | ||
private Channel ch; | ||
private final ChannelGroup accepted = | ||
new DefaultChannelGroup(new DefaultEventExecutorGroup(5).next()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I understand it correct from the context, the size of the channel group should be maxShuffleConnections, which is unlimited by default (configurable via mapreduce.shuffle.max.connections)
if ((maxShuffleConnections > 0) && (accepted.size() >= maxShuffleConnections)) { | ||
NettyChannelHelper.channelActive(ctx.channel()); | ||
int numConnections = activeConnections.incrementAndGet(); | ||
if ((maxShuffleConnections > 0) && (numConnections > maxShuffleConnections)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these channel bookeeping is no longer needed given that channel size is limited.
sendError(ctx, BAD_REQUEST); | ||
return; | ||
} else if (cause instanceof IOException) { | ||
if (cause instanceof ClosedChannelException) { | ||
LOG.debug("Ignoring closed channel error", cause); | ||
LOG.debug("Ignoring closed channel error, channel id: " + ch.id(), cause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use parameterized logging, otherwise wrap it in a LOG.isDebugEnabled() check.
Closing this PR as it's already merged into trunk. Thank you. |
…ntributed by Szilard Nemeth.
…ty4 apache#3259. Contributed by Szilard Nemeth. (cherry picked from commit 5bb11ce) Change-Id: If33ff1e8adda356dd5c6dadfb54fde9d31f84de7
…ty4 apache#3259. Contributed by Szilard Nemeth. (cherry picked from commit 5bb11ce) Change-Id: If33ff1e8adda356dd5c6dadfb54fde9d31f84de7
…e#3259. Contributed by Szilard Nemeth. netty - part2
…e#3259. Contributed by Szilard Nemeth. netty - part2
This list captures the current state of non-upstream changes in our branch that are not in the public repo. ---Changes cherry-picked to branch-3.3.6-dremio from branch-3.3.2-dremio--- The below changes were on branch-3.3.2-dremio and needed to be brought to branch-3.3.6-dremio to prevent regressing scenarios these changes addressed. HADOOP-18928: S3AFileSystem URL encodes twice where Path has trailing / (proposed) DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635) DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635) DX-66470: Allow for custom shared key signer for ABFS DX-66673: Backport HADOOP-18602. Remove netty3 dependency DX-66673: Backport MAPREDUCE-7434. Fix ShuffleHandler tests. Contributed by Tamas Domok DX-66673: Backport MAPREDUCE-7431. ShuffleHandler refactor and fix after Netty4 upgrade. (apache#5311) DX-66673: Backport HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 apache#3259. Contributed by Szilard Nemeth. DX-66673: Backport HADOOP-17115. Replace Guava Sets usage by Hadoop's own Sets in hadoop-common and hadoop-tools (apache#2985) HADOOP-18676. jettison dependency override in hadoop-common lib DX-52816: Downgrade azure-data-lake-store-sdk to 2.3.3 to support dremio version. DX-52701: Remove node based module by Naveen Kumar DX-32012: Adding BatchList Iterator for ListFiles by “ajmeera.nagaraju” DX-18552: Make file status check optional in S3AFileSystem create() Add flag to skip native tests by Laurent Goujon DX-21904: Support S3 requester-pays headers by Brandon Huang DX-21471: Fix checking of use of OAuth credentials with AzureNativeFileSystem DX-19314: make new kms format configurable DX-17058 Add FileSystem to META-INF/services DX-17317 Fix incorrect parameter passed into AzureADAuthenticator-getTokenUsingClientCreds by TiffanyLam DX-17276 Azure AD support for StorageV1 by James Duong DX-17276 Add Azure AD support in Dremio's hadoop-azure library for Storage V1 support unwraps BindException in HttpServer2 ---Changes picked up by moving to 3.3.6--- The below changes were changes on branch-3.3.2-dremio that did not need to come to branch-3.3.6-dremio as the public 3.3.6 branch contained the fixes already. DX-67500: Backport HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files. DX-66673: Backport HADOOP-18079. Upgrade Netty to 4.1.77. (apache#3977) DX-66673: Backport HADOOP-11245. Update NFS gateway to use Netty4 (apache#2832) (apache#4997) DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio DX-63800 Bump commons-net from 3.6 to 3.9.0 to address CVE-2021-37533 DX-27168: removing org.codehaus.jackson Change-Id: I6cdb968e33826105caff96e1c3d2c6313a550689
* ODP-1095 CVE Fix jettison upgrade (cherry picked from commit c4e0492) * gson upgraded to 2.9.0 (cherry picked from commit 60d566d) * ODP-1098: Upgrade jackson from 2.10.5 to 2.13.2.2 (cherry picked from commit 5eb67fd) * ODP-1099 | Upgrade jetty version to 9.4.43.v20210629 (cherry picked from commit 1ac7946) * ODP-1104 | snappy-java to 1.1.10.4, snappy-java to 1.1.10.4 (cherry picked from commit 56016fb) * ODP-1103: HADOOP-11245. Update NFS gateway to use Netty4 (apache#2832) netty - part1 (cherry picked from commit d94759b) * ODP-1103: HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 apache#3259. Contributed by Szilard Nemeth. netty - part2 (cherry picked from commit ee0f478) # Conflicts: # hadoop-project/pom.xml * ODP-1104 | update guava.version to 32.0.1-jre (cherry picked from commit d45a329) * YARN-9081. Update jackson from 1.9.13 to 2.x in hadoop-yarn-services-core. HADOOP-15983. Use jersey-json that is built to use jackson2 (apache#3988) (cherry picked from commit 1545e6c) * ODP-1119-snakeyaml dependency: upgrade to v2.0 (cherry picked from commit 4d1c080) * ODP-1098: add javax.ws.rs-api - 2.1.1 dependency YARN-11558 - Fix dependency convergence error on hbase2 profile (cherry picked from commit 5a9a65d) * ODP-1104 | mvn dependency fix for snappy-java in hadoop-yarn-server-timelineservice-hbase-tests/pom.xml (cherry picked from commit a41ad5c) * TAG change 3.2.3.3.2.2.0-1095 (cherry picked from commit 8567f12) * TAG change2 3.2.3.3.2.2.0-1095 (cherry picked from commit 61e898d) * TAG change3 3.2.3.3.2.3.0-1095 (cherry picked from commit 58d1b91) * TAG change4 3.2.3.3.2.2.0-2 (cherry picked from commit a813e1d) * ODP-1095: set hadoop version as 3.2.3.3.2.2.0-1095 (cherry picked from commit d12de04) * HADOOP-18950. Use shaded avro jar (cherry picked from commit 509824a) * ODP-1103|netty4 upgrade to 4.1.94 (cherry picked from commit 627108d) * jettison dependency exclusion from hadoop-common (cherry picked from commit 0da4db2) * zookeeper release corrected to 3.2.2.0-1095 (cherry picked from commit 4a8bebf) * excluded jackson-core-asl from hadoop-yarn-server-timelineservice-hbase-tests * distribution management addition * ODP-1103: remove netty jar from hadoop-yarn-server-timelineservice-hbase-tests * HADOOP-18512. Upgrade woodstox-core to 5.4.0 for security fix * fixed typo * HADOOP-17033. Update commons-codec from 1.11 to 1.14 * Fixed maven pom across all pom * Removing not required file --------- Co-authored-by: manishsinghmowall <manishsingh@acceldata.io> Co-authored-by: kravii <ravi@acceldata.io> Co-authored-by: Prabhjyot Singh <prabhjyot@acceldata.io>
(cherry picked from commit e63bcae) (cherry picked from commit 8add13d) ODP-2013: ODP-1095 Critical CVE fixes (cherry picked from commit 4095c5c) (cherry picked from commit 22ccd3f) ODP-2013: ODP-1095 Critical CVE fixes -v2 (cherry picked from commit 567d6b5) (cherry picked from commit ba32cb6) ODP-2013: incorporating ODP-1095 | ODP-1415 | ODP-1098 | YARN-9081 | HADOOP-18950 -v3 (cherry picked from commit 755ce5e) (cherry picked from commit 3fa3225) ODP-2013: incorporating ODP-1415 | ODP-1098 | YARN-9081 | HADOOP-18950 (cherry picked from commit 3a7657d) (cherry picked from commit 1371b2e) ODP-1098: add javax.ws.rs-api - 2.1.1 dependency YARN-11558 - Fix dependency convergence error on hbase2 profile (cherry picked from commit 5a9a65d) (cherry picked from commit 4f81c22) (cherry picked from commit e67f524) (cherry picked from commit 041b2d9) ODP-1103|netty4 upgrade to 4.1.94 (cherry picked from commit 627108d) (cherry picked from commit 9030004) (cherry picked from commit 6b97c59) (cherry picked from commit 42d930e) ODP-1103: HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 apache#3259. Contributed by Szilard Nemeth. netty - part2 (cherry picked from commit ee0f478) (cherry picked from commit 1f5f021) (cherry picked from commit 41e0394) (cherry picked from commit 5e86fa7)
NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute