Skip to content

HDFS-17227. EC: Fix bug in choosing targets when racks is not enough. #6192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

zhangshuyan0
Copy link
Contributor

@zhangshuyan0 zhangshuyan0 commented Oct 16, 2023

Description of PR

Bug description
If,

  1. There is a striped block blockinfo1, which has an excess replica on datanodeA.
  2. blockinfo1 has an internal block that needs to be reconstruction.
  3. The number of racks is less than the number of internal blocks of blockinfo1.

Then, NN may choose datanodeA to reconstruct the internal block, resulting in two internal blocks of blockinfo1 on datanodeA, causing confusion.

Root cause and solution
When we use BlockPlacementPolicyRackFaultTolerant for choosing targets and the racks is insufficient, chooseEvenlyFromRemainingRacks will be called. Currently, chooseEvenlyFromRemainingRacks calls chooseOnce, chooseOnce use newExcludeNodes as parameter instead of excludedNodes. When we choose targets for reconstructing internal blocks, newExcludeNodes only includes datanodes that contain live replicas, and does not include datanodes that have excess replicas. This may result in datanodes with excess replicas to be chosen.

LOG.trace("Chosen nodes: {}", results);
LOG.trace("Excluded nodes: {}", excludedNodes);
LOG.trace("New Excluded nodes: {}", newExcludeNodes);
final int numOfReplicas = totalReplicaExpected - results.size();
numResultsOflastChoose = results.size();
try {
chooseOnce(numOfReplicas, writer, newExcludeNodes, blocksize,
++bestEffortMaxNodesPerRack, results, avoidStaleNodes,
storageTypes);

I think we do not need to use newExcludeNodes, just pass excludedNodes to chooseOnce.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 8m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 34s trunk passed
+1 💚 compile 0m 52s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 48s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 46s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 13s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 54s trunk passed
+1 💚 shadedclient 21m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 46s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 46s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 40s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 33s the patch passed
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 35s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 5s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 52s the patch passed
+1 💚 shadedclient 21m 51s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 197m 27s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
298m 24s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
hadoop.hdfs.TestErasureCodingMultipleRacks
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/1/artifact/out/Dockerfile
GITHUB PR #6192
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 7b6acbc6acab 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 226a9b0
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/1/testReport/
Max. process+thread count 3518 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 28s trunk passed
+1 💚 compile 0m 53s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 49s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 45s trunk passed
+1 💚 mvnsite 0m 54s trunk passed
+1 💚 javadoc 0m 49s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 8s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 59s trunk passed
+1 💚 shadedclient 21m 50s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 41s the patch passed
+1 💚 compile 0m 46s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 46s the patch passed
+1 💚 compile 0m 41s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 41s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 34s the patch passed
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 36s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 4s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 52s the patch passed
+1 💚 shadedclient 21m 31s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 192m 6s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
284m 22s
Reason Tests
Failed junit tests hadoop.hdfs.TestErasureCodingMultipleRacks
hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/2/artifact/out/Dockerfile
GITHUB PR #6192
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f27d0e89ef1f 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 226a9b0
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/2/testReport/
Max. process+thread count 3266 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! How about to add one unit test to cover this case?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 56s trunk passed
+1 💚 compile 0m 52s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 46s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 12s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 59s trunk passed
+1 💚 shadedclient 21m 47s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 45s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 46s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 40s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 33s the patch passed
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 38s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 3s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 54s the patch passed
+1 💚 shadedclient 21m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 198m 31s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
291m 28s
Reason Tests
Failed junit tests hadoop.hdfs.TestErasureCodingMultipleRacks
hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/3/artifact/out/Dockerfile
GITHUB PR #6192
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ee50156ecd10 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 226a9b0
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/3/testReport/
Max. process+thread count 4038 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6192/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 50m 48s trunk passed
+1 💚 compile 1m 21s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 16s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 1m 11s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
-1 ❌ javadoc 1m 7s /branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.txt hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.
+1 💚 javadoc 1m 39s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 26s trunk passed
+1 💚 shadedclient 41m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 10s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 7s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 1m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 mvnsite 1m 16s the patch passed
-1 ❌ javadoc 0m 55s /patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.txt hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.
+1 💚 javadoc 1m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 40m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 253m 42s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
410m 25s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
hadoop.hdfs.TestErasureCodingMultipleRacks
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6192/1/artifact/out/Dockerfile
GITHUB PR #6192
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 4e4431e313d6 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 226a9b0
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6192/1/testReport/
Max. process+thread count 2455 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6192/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants