Skip to content

HDFS-17885. Fix TestDFSAdmin.testAllDatanodesReconfig flaky test.#8269

Closed
slfan1989 wants to merge 1 commit intoapache:trunkfrom
slfan1989:HDFS-17885
Closed

HDFS-17885. Fix TestDFSAdmin.testAllDatanodesReconfig flaky test.#8269
slfan1989 wants to merge 1 commit intoapache:trunkfrom
slfan1989:HDFS-17885

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Feb 23, 2026

Description of PR

JIRA: HDFS-17885. Fix TestDFSAdmin.testAllDatanodesReconfig flaky test.

Problem

TestDFSAdmin.testAllDatanodesReconfig test fails with the following error:

Expected size:<3> but was:<1> in:
<["Starting of reconfiguration task successful on 0 nodes, failed on 2 nodes."]>
at org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1263)

Root Cause

The test has a "self-conflicting" issue where it starts the reconfiguration task twice on the same DataNodes:

  • First call: admin.startReconfiguration("datanode", "livenodes") - Successfully starts reconfiguration on 2 DataNodes
  • Second call: reconfigurationOutErrFormatter("startReconfiguration", ...) - Internally calls admin.startReconfigurationUtil(...) again

The problem is that DataNode's startReconfigurationTask() does not allow concurrent reconfiguration. If a reconfiguration task is already running, it throws IOException with message Another reconfiguration task is running.

Therefore, the second invocation fails on both DataNodes, resulting in output containing only the summary line:

Starting of reconfiguration task successful on 0 nodes, failed on 2 nodes.

This causes the assertion assertThat(outsForStartReconf).hasSize(3) to fail because:

  • Expected: 2 "Started reconfiguration task on node" lines + 1 summary line = 3 lines
  • Actual: 0 success lines + 1 summary line = 1 line

Solution

Remove the duplicate invocation by:

  1. Calling admin.startReconfigurationUtil() only once
  2. Directly capturing the output to ByteArrayOutputStream
  3. Parsing the output for assertions

Additionally, improve the test robustness by:

  • Using NUM_DATANODES constant instead of hardcoded values
  • Using stream filtering to count "Started reconfiguration" lines instead of relying on fixed positions (which is more resilient to concurrent output ordering)
  • Removing unnecessary Thread.sleep(1000) before awaitReconfigurationFinished()

How was this patch tested?

./mvnw -pl hadoop-hdfs-project/hadoop-hdfs -Dtest=TestDFSAdmin#testAllDatanodesReconfig test

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.tools.TestDFSAdmin
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.421 s -- in org.apache.hadoop.hdfs.tools.TestDFSAdmin
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

If an AI tool was used:

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 13m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 31s trunk passed
+1 💚 compile 1m 47s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 49s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 50s trunk passed
+1 💚 mvnsite 1m 55s trunk passed
+1 💚 javadoc 1m 32s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 31s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 9s trunk passed
+1 💚 shadedclient 31m 19s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 20s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 16s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 13s the patch passed
+1 💚 mvnsite 1m 26s the patch passed
+1 💚 javadoc 0m 57s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 2s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 50s the patch passed
+1 💚 shadedclient 30m 0s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 215m 19s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
360m 19s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8269/1/artifact/out/Dockerfile
GITHUB PR #8269
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d294d007e22e 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c71bea9
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8269/1/testReport/
Max. process+thread count 3629 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8269/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@cnauroth cnauroth closed this in ea8deca Feb 23, 2026
@cnauroth
Copy link
Contributor

I merged this to trunk. Thank you @slfan1989 .

eciuca pushed a commit to eciuca/hadoop that referenced this pull request Feb 26, 2026
Closes apache#8269

Signed-off-by: Chris Nauroth <cnauroth@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants