Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16111. Add a configuration to RoundRobinVolumeChoosingPolicy to … #3175

Merged
merged 1 commit into from
Jul 28, 2021

Conversation

zhihaixu2012
Copy link
Contributor

…avoid failed volumes at datanodes.

Change-Id: Iead25812d4073e3980893e3e76f7d2b03b57442a

JIRA: https://issues.apache.org/jira/browse/HDFS-16111

there is a potential bug when picking a disk volume to write a new block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, The code to select a disk will check whether the available space on the selected disk is more than the size bytes of block file to store (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86) But when creating a new block, there will be two files created: one is the block file blk_XXXX, the other is block metadata file blk_XXXX_XXXX.meta, this is the code when finalizing a block, both block file size and meta data file size will be updated: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L391 the current code only considers the size of block file and doesn't consider the size of block metadata file, when choosing a disk in RoundRobinVolumeChoosingPolicy. There can be a lot of on-going blocks received at the same time, the default maximum number of DataXceiver threads is 4096. This will underestimate the total size needed to write a block, which will potentially cause the disk full error(No space left on device) when writing a replica.

Since the size of the block metadata file is not fixed, I suggest to add a configuration(dfs.datanode.round-robin-volume-choosing-policy.additional-available-space) to safeguard the disk space when choosing a volume to write a new block data in RoundRobinVolumeChoosingPolicy.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 57s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 17s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 6s trunk passed
+1 💚 mvnsite 1m 24s trunk passed
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 28s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 8s trunk passed
+1 💚 shadedclient 16m 2s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 14s the patch passed
+1 💚 compile 1m 12s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 12s the patch passed
+1 💚 compile 1m 8s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 57s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 6 new + 461 unchanged - 0 fixed = 467 total (was 461)
+1 💚 mvnsite 1m 12s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 8s the patch passed
+1 💚 shadedclient 16m 5s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 388m 19s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
472m 49s
Reason Tests
Failed junit tests hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
hadoop.hdfs.web.TestWebHdfsFileSystemContract
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeHdfsFileSystemContract
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/1/artifact/out/Dockerfile
GITHUB PR #3175
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 8b4e5220db5c 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c00533cd419136ff13cf5d039306ffaccd33fe4a
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/1/testReport/
Max. process+thread count 2852 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 10s trunk passed
+1 💚 compile 1m 43s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 30s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 14s trunk passed
+1 💚 mvnsite 1m 53s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 45s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 3s trunk passed
+1 💚 shadedclient 18m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 9s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 51s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 199 unchanged - 0 fixed = 201 total (was 199)
+1 💚 mvnsite 1m 15s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 47s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 9s the patch passed
+1 💚 shadedclient 15m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 243m 7s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
338m 42s
Reason Tests
Failed junit tests hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
hadoop.hdfs.server.balancer.TestBalancer
hadoop.hdfs.TestMiniDFSCluster
hadoop.hdfs.server.namenode.TestStripedINodeFile
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
hadoop.hdfs.server.mover.TestMover
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement
hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile
hadoop.hdfs.server.balancer.TestBalancerLongRunningTasks
hadoop.hdfs.server.diskbalancer.TestDiskBalancer
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/2/artifact/out/Dockerfile
GITHUB PR #3175
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux d692d7176b36 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d60d3227c853609ef3985c87e7843ea6201a0523
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/2/testReport/
Max. process+thread count 3213 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@zhihaixu2012
Copy link
Contributor Author

Updated the patch to fix the above test failures.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 40s trunk passed
+1 💚 compile 1m 23s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 18s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 4s trunk passed
+1 💚 mvnsite 1m 25s trunk passed
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 12s trunk passed
+1 💚 shadedclient 16m 21s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 1m 14s /results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 467 unchanged - 1 fixed = 468 total (was 468)
+1 💚 compile 1m 9s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
-1 ❌ javac 1m 9s /results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 1 new + 451 unchanged - 1 fixed = 452 total (was 452)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 54s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 338 unchanged - 0 fixed = 341 total (was 338)
+1 💚 mvnsite 1m 14s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 11s the patch passed
+1 💚 shadedclient 16m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 237m 2s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
321m 31s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/3/artifact/out/Dockerfile
GITHUB PR #3175
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 4c30d6034656 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1b52bd53b5c186a2737c39d5d7c6b8aaab924766
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/3/testReport/
Max. process+thread count 3138 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

…avoid failed volumes at datanodes.

Change-Id: Iead25812d4073e3980893e3e76f7d2b03b57442a
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I'll merge this

@jojochuang jojochuang merged commit ae20516 into apache:trunk Jul 28, 2021
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 33s trunk passed
+1 💚 compile 1m 53s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 1m 47s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 20s trunk passed
+1 💚 mvnsite 1m 48s trunk passed
+1 💚 javadoc 1m 20s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 52s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 3m 58s trunk passed
+1 💚 shadedclient 22m 18s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 38s the patch passed
+1 💚 compile 1m 45s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 1m 45s the patch passed
+1 💚 compile 1m 34s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 1m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 4s the patch passed
+1 💚 mvnsite 1m 40s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 5s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 45s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 4m 11s the patch passed
+1 💚 shadedclient 22m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 418m 8s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
529m 45s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.web.TestWebHdfsFileSystemContract
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/4/artifact/out/Dockerfile
GITHUB PR #3175
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 6fa4feae10d8 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 24a19b0
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/4/testReport/
Max. process+thread count 1830 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3175/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
…avoid failed volumes at datanodes. (apache#3175)

Change-Id: Iead25812d4073e3980893e3e76f7d2b03b57442a

Co-authored-by: Zhihai Xu <zxu@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants