Skip to content

Backport "HBASE-28569: fix race condition during WAL splitting leading to corru…" to branch-2 #6884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

ciacono
Copy link
Contributor

@ciacono ciacono commented Apr 3, 2025

…pt recovered.edits

If an exception happens in the call to finishWriterThreads in the org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method, the call to closeWriters should not execute, as it may lead to a race condition that leads to file corruption if the regionserver aborts. The execution of closeWriters in this case would write the trailer in parallel with writer threads, causing corruption, and then the corrupt file would get renamed and finalized when it should not be. This corruption causes problems when the region is then to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful, the corrupt files should not be renamed and finalized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

…pt recovered.edits

If an exception happens in the call to finishWriterThreads in the
org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method,
the call to closeWriters should not execute, as it may lead to a race condition
that leads to file corruption if the regionserver aborts. The execution of
closeWriters in this case would write the trailer in parallel with writer threads,
causing corruption, and then the corrupt file would get renamed and finalized
when it should not be. This corruption causes problems when the region is then
to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful,
the corrupt files should not be renamed and finalized.
@ciacono ciacono force-pushed the branch-2-HBASE-28569 branch from 56fedf7 to 0a71793 Compare April 4, 2025 17:23
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ branch-2 Compile Tests _
+1 💚 mvninstall 4m 45s branch-2 passed
+1 💚 compile 3m 24s branch-2 passed
+1 💚 checkstyle 0m 40s branch-2 passed
+1 💚 spotbugs 1m 52s branch-2 passed
+1 💚 spotless 1m 3s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 12s the patch passed
+1 💚 compile 3m 22s the patch passed
+1 💚 javac 3m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 44s the patch passed
+1 💚 spotbugs 1m 55s the patch passed
+1 💚 hadoopcheck 23m 22s Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
+1 💚 spotless 0m 52s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
49m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6884
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux a12d74f14a93 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 0a71793
Default Java Eclipse Adoptium-11.0.23+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 2s branch-2 passed
+1 💚 compile 1m 0s branch-2 passed
+1 💚 javadoc 0m 29s branch-2 passed
+1 💚 shadedjars 6m 8s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 55s the patch passed
+1 💚 javac 0m 55s the patch passed
+1 💚 javadoc 0m 26s the patch passed
+1 💚 shadedjars 6m 7s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 204m 20s hbase-server in the patch passed.
230m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6884
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5d3515fa6ae3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 0a71793
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/testReport/
Max. process+thread count 4457 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 12s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 57s branch-2 passed
+1 💚 compile 0m 47s branch-2 passed
+1 💚 javadoc 0m 29s branch-2 passed
+1 💚 shadedjars 6m 7s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 29s the patch passed
+1 💚 compile 0m 59s the patch passed
+1 💚 javac 0m 59s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 5m 59s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 209m 50s hbase-server in the patch passed.
237m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #6884
Optional Tests javac javadoc unit compile shadedjars
uname Linux ccb9701dd452 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 0a71793
Default Java Temurin-1.8.0_412-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/testReport/
Max. process+thread count 4291 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 24s branch-2 passed
+1 💚 compile 0m 54s branch-2 passed
+1 💚 javadoc 0m 27s branch-2 passed
+1 💚 shadedjars 6m 28s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 6s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 javadoc 0m 25s the patch passed
+1 💚 shadedjars 6m 23s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 216m 54s hbase-server in the patch passed.
244m 11s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6884
Optional Tests javac javadoc unit compile shadedjars
uname Linux 2488e104407e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 0a71793
Default Java Eclipse Adoptium-11.0.23+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/testReport/
Max. process+thread count 4438 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6884/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9 Apache9 added the backport This PR is a back port of some issue or issues already committed to master label Apr 7, 2025
@Apache9 Apache9 merged commit 81f29ae into apache:branch-2 Apr 7, 2025
1 check passed
Apache9 pushed a commit that referenced this pull request Apr 7, 2025
…t recovered.edits (#6884)

If an exception happens in the call to finishWriterThreads in the
org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method,
the call to closeWriters should not execute, as it may lead to a race condition
that leads to file corruption if the regionserver aborts. The execution of
closeWriters in this case would write the trailer in parallel with writer threads,
causing corruption, and then the corrupt file would get renamed and finalized
when it should not be. This corruption causes problems when the region is then
to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful,
the corrupt files should not be renamed and finalized.

Signed-off-by: Duo Zhang <zhangduo@apache.org>
(cherry picked from commit 81f29ae)
Apache9 pushed a commit that referenced this pull request Apr 7, 2025
…t recovered.edits (#6884)

If an exception happens in the call to finishWriterThreads in the
org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method,
the call to closeWriters should not execute, as it may lead to a race condition
that leads to file corruption if the regionserver aborts. The execution of
closeWriters in this case would write the trailer in parallel with writer threads,
causing corruption, and then the corrupt file would get renamed and finalized
when it should not be. This corruption causes problems when the region is then
to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful,
the corrupt files should not be renamed and finalized.

Signed-off-by: Duo Zhang <zhangduo@apache.org>
(cherry picked from commit 81f29ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport This PR is a back port of some issue or issues already committed to master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants