Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN-11664: Remove HDFS Binaries/Jars Dependency From Yarn #6631

Merged
merged 4 commits into from
Sep 4, 2024

Conversation

shameersss1
Copy link
Contributor

@shameersss1 shameersss1 commented Mar 16, 2024

Description of PR

Remove HDFS Binaries/Jars Dependency From Yarn. Refer YARN-11664 for more details

How was this patch tested?

Manual Verification

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@shameersss1
Copy link
Contributor Author

Note: All the Spotbug error seems unrelated.

@aajisaka
Copy link
Member

-1. Please do not change the following @Public and @Evolving classes:

  • QuotaExceededException.java
  • DSQuotaExceededException.java

https://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/Compatibility.html
Evolving interfaces must not change between minor releases.

Can we use ClusterStorageCapacityExceededException (hadoop-common) instead of DSQuotaExceededException/QuotaExceededException (hadoop-hdfs) in YARN source code?

IOStreamPair.java is @Private and I think we can relocate to hadoop-common.

@shameersss1
Copy link
Contributor Author

-1. Please do not change the following @Public and @Evolving classes:

* QuotaExceededException.java

* DSQuotaExceededException.java

https://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/Compatibility.html
Evolving interfaces must not change between minor releases.

Can we use ClusterStorageCapacityExceededException (hadoop-common) instead of DSQuotaExceededException/QuotaExceededException (hadoop-hdfs) in YARN source code?

IOStreamPair.java is @Private and I think we can relocate to hadoop-common.

ClusterStorageCapacityExceededException is a parent exception of DSQuotaExceededException and hence catching it will serve the purpose as well. I will raise a revision of this change.

@hadoop-yetus

This comment was marked as outdated.

@shameersss1 shameersss1 force-pushed the YARN-11664 branch 2 times, most recently from 5c7c9fd to c0b5143 Compare March 21, 2024 06:03
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me, but hdfs mail list should be invited to comment.

made some minor suggestions


import java.io.Closeable;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.io.IOUtils;

/**
* A little struct class to wrap an InputStream and an OutputStream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that they both get closed in the close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

/**
* This class contains constants for configuration keys and default values.
*/
public final class Constants {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in this package. should go somewhere under filesystem, maybe a class like

org.apache.hadoop.fs.HdfsCommonConstants for all hdfs related constants which need to go into hadoop-common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

*/
public final class Constants {

public static final Text HDFS_DELEGATION_KIND =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs with {@value} entries here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@shameersss1
Copy link
Contributor Author

looks ok to me, but hdfs mail list should be invited to comment.

made some minor suggestions

I have send an email to hdfs mailing list asking their opinion as well.

@hadoop-yetus

This comment was marked as outdated.

@steveloughran
Copy link
Contributor

waiting to see what hdfs people say; mentioned internally.

now, there is a way to do this with a smaller diff, specifically, move the IOPair class into hadoop common but keep with the same package name.

something to seriously consider. would reduce the risk of any code elsewhere making explicit use of the class then breaking.

@shameersss1
Copy link
Contributor Author

waiting to see what hdfs people say; mentioned internally.

now, there is a way to do this with a smaller diff, specifically, move the IOPair class into hadoop common but keep with the same package name.

something to seriously consider. would reduce the risk of any code elsewhere making explicit use of the class then breaking.

That is a great suggestion. It will help to reduce the diff. @steveloughran - i have incoporated that change as well in the new revision.

I have not heard from HDFS community so far. I suppose "no news is good news".

@hadoop-yetus

This comment was marked as outdated.

@steveloughran
Copy link
Contributor

please, avoid force push unless things are completely unreconcilable or nobody has reviewed yet. It really makes reviewing harder, which discourages people from reviewing.

@shameersss1
Copy link
Contributor Author

please, avoid force push unless things are completely unreconcilable or nobody has reviewed yet. It really makes reviewing harder, which discourages people from reviewing.

It make sense. I will keep this in mind next time. Sorry for the mess.

@hadoop-yetus

This comment was marked as outdated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spotbugs failure is unrelated; the returning null is the same as now.

javadocs does need a fix

[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6631/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java:46: error: reference not found
[ERROR]    * The value is referenced from {@link HdfsCommonConstants.HDFS_DELEGATION_KIND}.
[ERROR]                                          

@shameersss1
Copy link
Contributor Author

@steveloughran - Addressed the javadoc issue.

@hadoop-yetus

This comment was marked as outdated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Can you please take a look ? The unit test failures are flaky and unrelated. Those failing tests are passing on local.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1.

[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6631/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:386: error: reference not found
[ERROR]    * The value is referenced from {@link HdfsCommonConstants.DFS_ADMIN}.

@@ -381,7 +382,11 @@ public class DFSConfigKeys extends CommonConfigurationKeys {

public static final String DFS_NAMENODE_XATTRS_ENABLED_KEY = "dfs.namenode.xattrs.enabled";
public static final boolean DFS_NAMENODE_XATTRS_ENABLED_DEFAULT = true;
public static final String DFS_ADMIN = "dfs.cluster.administrators";
/**
* The value is referenced from {@link HdfsCommonConstants.DFS_ADMIN}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to have a # to refer to field/method. Surprised your IDE didn't flag this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@steveloughran
Copy link
Contributor

keep going, we are close. but don't request a review from me until javadoc is fixed

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 47s Maven dependency ordering for branch
+1 💚 mvninstall 33m 30s trunk passed
+1 💚 compile 17m 28s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 15m 49s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 2s trunk passed
+1 💚 mvnsite 5m 57s trunk passed
+1 💚 javadoc 4m 44s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 12s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
-1 ❌ spotbugs 1m 11s /branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core-warnings.html hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core in trunk has 1 extant spotbugs warnings.
+1 💚 shadedclient 34m 52s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for patch
+1 💚 mvninstall 3m 59s the patch passed
+1 💚 compile 16m 58s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 16m 58s the patch passed
+1 💚 compile 15m 56s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 15m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 46s the patch passed
+1 💚 mvnsite 6m 9s the patch passed
+1 💚 javadoc 4m 46s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 11s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 12m 26s the patch passed
-1 ❌ shadedclient 33m 59s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 8s hadoop-common in the patch passed.
+1 💚 unit 2m 41s hadoop-hdfs-client in the patch passed.
+1 💚 unit 217m 28s hadoop-hdfs in the patch passed.
+1 💚 unit 5m 51s hadoop-yarn-common in the patch passed.
+1 💚 unit 21m 17s hadoop-yarn-services-core in the patch passed.
+1 💚 asflicense 1m 7s The patch does not generate ASF License warnings.
535m 29s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/19/artifact/out/Dockerfile
GITHUB PR #6631
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c2f3f5333083 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / aa5f5ae
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/19/testReport/
Max. process+thread count 4014 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/19/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 30s Maven dependency ordering for branch
+1 💚 mvninstall 32m 39s trunk passed
+1 💚 compile 17m 40s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 16m 9s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 26s trunk passed
+1 💚 mvnsite 6m 24s trunk passed
+1 💚 javadoc 5m 14s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 39s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
-1 ❌ spotbugs 1m 13s /branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core-warnings.html hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core in trunk has 1 extant spotbugs warnings.
+1 💚 shadedclient 34m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 4m 5s the patch passed
+1 💚 compile 16m 56s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 16m 56s the patch passed
+1 💚 compile 16m 12s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 16m 12s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 23s the patch passed
+1 💚 mvnsite 6m 24s the patch passed
+1 💚 javadoc 5m 15s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 31s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 12m 49s the patch passed
-1 ❌ shadedclient 34m 44s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 38s hadoop-common in the patch passed.
+1 💚 unit 2m 48s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 120m 29s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch failed.
+1 💚 unit 3m 31s hadoop-yarn-common in the patch passed.
-1 ❌ unit 0m 50s /patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt hadoop-yarn-services-core in the patch failed.
-1 ❌ asflicense 1m 8s /results-asflicense.txt The patch generated 149 ASF License warnings.
411m 5s
Reason Tests
Unreaped Processes hadoop-hdfs:2
Failed junit tests hadoop.hdfs.server.namenode.TestMetaSave
hadoop.hdfs.server.namenode.TestFileContextXAttr
hadoop.hdfs.server.namenode.TestHDFSConcat
hadoop.hdfs.server.namenode.TestNameNodeReconfigure
hadoop.hdfs.server.namenode.TestQuotaByStorageType
hadoop.hdfs.server.namenode.TestINodeFile
hadoop.hdfs.server.namenode.TestLargeDirectoryDelete
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/20/artifact/out/Dockerfile
GITHUB PR #6631
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1892f6d67181 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ddd4f86
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Unreaped Processes Log https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/20/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-reaper.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/20/testReport/
Max. process+thread count 4122 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/20/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Javadoc is fixed. UT failures, Spotbugs and asf seems unrelated.

Copy link
Member

@aajisaka aajisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran steveloughran merged commit 6c01490 into apache:trunk Sep 4, 2024
1 of 6 checks passed
@steveloughran
Copy link
Contributor

merged to trunk; will take a PR to branch-3.4

@steveloughran
Copy link
Contributor

I think this is triggering a regression in enforcer

[INFO]   Adding ignore: *
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
Duplicate classes found:

  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.5.0-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-api:jar:3.5.0-SNAPSHOT:compile
  Duplicate classes:
    org/apache/hadoop/hdfs/protocol/datatransfer/package-info.class

I'm going to revert the PR and we'll have to move that IOStreamPair class to a new package after all. pity

asfgit pushed a commit that referenced this pull request Sep 5, 2024
@steveloughran
Copy link
Contributor

yes, reverting fixes this.

@shameersss1
Copy link
Contributor Author

I think this is triggering a regression in enforcer

[INFO]   Adding ignore: *
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
Duplicate classes found:

  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.5.0-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-api:jar:3.5.0-SNAPSHOT:compile
  Duplicate classes:
    org/apache/hadoop/hdfs/protocol/datatransfer/package-info.class

I'm going to revert the PR and we'll have to move that IOStreamPair class to a new package after all. pity

Sure, @steveloughran - Instead of moving IOStreamPair to a new package , Can we ignore this specific org/apache/hadoop/hdfs/protocol/datatransfer/package-info.class class from BanDuplicateClasses enforcer? Anyhow package-info.class is not a critical class.

@steveloughran
Copy link
Contributor

removing the package-info class would be the simpler solution, but we need to understand how the regression got in. your PR seemed to take, but everything after it broke.

KeeProMise pushed a commit to KeeProMise/hadoop that referenced this pull request Sep 9, 2024
To support YARN deployments in clusters without HDFS
some changes have been made in packaging

* new hadoop-common class org.apache.hadoop.fs.HdfsCommonConstants
* hdfs class org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair moved
  from hdfs-client to hadoop-common
* YARN handlers for DSQuotaExceededException replaced by use of superclass
  ClusterStorageCapacityExceededException.  

Contributed by Syed Shameerur Rahman
KeeProMise pushed a commit to KeeProMise/hadoop that referenced this pull request Sep 9, 2024
Hexiaoqiao pushed a commit to Hexiaoqiao/hadoop that referenced this pull request Sep 12, 2024
To support YARN deployments in clusters without HDFS
some changes have been made in packaging

* new hadoop-common class org.apache.hadoop.fs.HdfsCommonConstants
* hdfs class org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair moved
  from hdfs-client to hadoop-common
* YARN handlers for DSQuotaExceededException replaced by use of superclass
  ClusterStorageCapacityExceededException.  

Contributed by Syed Shameerur Rahman
Hexiaoqiao pushed a commit to Hexiaoqiao/hadoop that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants