Skip to content

HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails #1768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

ramesh0201
Copy link

@ramesh0201 ramesh0201 commented Dec 17, 2019

No description provided.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 31m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 20m 52s trunk passed
+1 💚 compile 19m 59s trunk passed
+1 💚 checkstyle 0m 47s trunk passed
+1 💚 mvnsite 1m 30s trunk passed
+1 💚 shadedclient 16m 39s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 32s trunk passed
+0 🆗 spotbugs 2m 27s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 26s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 1m 0s the patch passed
+1 💚 compile 20m 47s the patch passed
+1 💚 javac 20m 47s the patch passed
-0 ⚠️ checkstyle 0m 50s hadoop-common-project/hadoop-common: The patch generated 2 new + 39 unchanged - 0 fixed = 41 total (was 39)
+1 💚 mvnsite 1m 23s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 53s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 29s the patch passed
+1 💚 findbugs 2m 33s the patch passed
_ Other Tests _
-1 ❌ unit 10m 28s hadoop-common in the patch failed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
149m 10s
Reason Tests
Failed junit tests hadoop.fs.TestLocalDirAllocator
hadoop.metrics2.impl.TestMetricsSystemImpl
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1768/2/artifact/out/Dockerfile
GITHUB PR #1768
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 640c51c6887c 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 7b93575
Default Java 1.8.0_232
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1768/2/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1768/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1768/2/testReport/
Max. process+thread count 1509 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1768/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran changed the title HADOOP-16769 Log details of requested size and available capacity whe… HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails Jan 2, 2020
@apache apache deleted a comment from hadoop-yetus Jan 2, 2020
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, there's another failure mode: no write access to any of the specified dirs.

If that's the situation, we should be able to identify and report it as well. Indeed, now that we have being helpful on disk capacity -we probably need to differentiate the other failure mode to avoid confusion. Otherwise people who encounter permissions problems we misled into thinking its disk capacity.

Not sure the best approach here. We would probably need to keep the DiskErrorException from createPath() and use that as the inner cause of the new failure. That is: move the catch() clause up to the caller where the exception can be cached for later use.
This is probably broadly useful as the other failure modes which are probably worth reporting.

Goal: there's enough information in the strings and stack traces to identify the root cause without having to chase into the logs of machines.

String dir0 = buildBufferDir(ROOT, 0);
String dir1 = buildBufferDir(ROOT, 1);
conf.set(CONTEXT, dir0 + "," + dir1);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LambaTestUtils.intercept, which will do the catch, reporting on failure and error message checks

@@ -532,4 +532,23 @@ public void testGetLocalPathForWriteForInvalidPaths() throws Exception {
}
}

/**
* Test to check the LocalDirAllocation for the less space HADOOP-16769
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add trailing "." for javadocs

String dir1 = buildBufferDir(ROOT, 1);
conf.set(CONTEXT, dir0 + "," + dir1);
try {
dirAllocator.getLocalPathForWrite("p1/x", 3_000_000_000_000L, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going to happen on a disk with >3TB of capacity? Should we go for a bigger number?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed updating the pull request. I have a new code change that uses Long.MAX_VALUE, instead of this hardcoded number and then use a regex to match the error message. I will create a new pull request

@ramesh0201
Copy link
Author

ramesh0201 commented Jan 7, 2020

I will address the above change to catch and throw the two errors, so they are nested into one another, as part of the new pull request. Thanks!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 19m 2s trunk passed
+1 💚 compile 19m 57s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 16m 51s trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+1 💚 checkstyle 0m 51s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 shadedclient 16m 54s branch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 45s hadoop-common in trunk failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
+1 💚 javadoc 1m 3s trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+0 🆗 spotbugs 2m 14s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 12s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 52s the patch passed
+1 💚 compile 19m 15s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 19m 15s the patch passed
+1 💚 compile 17m 14s the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+1 💚 javac 17m 14s the patch passed
-0 ⚠️ checkstyle 0m 50s hadoop-common-project/hadoop-common: The patch generated 2 new + 39 unchanged - 0 fixed = 41 total (was 39)
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 53s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 44s hadoop-common in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09
+1 💚 findbugs 2m 17s the patch passed
_ Other Tests _
+1 💚 unit 9m 27s hadoop-common in the patch passed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
150m 14s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/artifact/out/Dockerfile
GITHUB PR #1768
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 5d0360a226f6 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / e756fe3
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_252-8u252-b09-1~18.04-b09
javadoc https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
checkstyle https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
javadoc https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/testReport/
Max. process+thread count 3296 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1768/3/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants