Skip to content

HBASE-28569: fix race condition during WAL splitting leading to corru… #6266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

ciacono
Copy link
Contributor

@ciacono ciacono commented Sep 18, 2024

…pt recovered.edits

If an exception happens in the call to finishWriterThreads in the org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method, the call to closeWriters should not execute, as it may lead to a race condition that leads to file corruption if the regionserver aborts. The execution of closeWriters in this case would write the trailer in parallel with writer threads, causing corruption, and then the corrupt file would get renamed and finalized when it should not be. This corruption causes problems when the region is then to be assigned. By removing the try finally block, the problematic closeWriters would not execute in the case of an exception in finishWriterThreads, which should then prevent this race from occurring and causing recovered.edits corruption.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@tsuna
Copy link

tsuna commented Mar 12, 2025

Can we get some eyes on this review?

@ciacono ciacono marked this pull request as draft March 19, 2025 19:10
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@ciacono ciacono marked this pull request as ready for review March 25, 2025 17:44
…pt recovered.edits

If an exception happens in the call to finishWriterThreads in the
org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method,
the call to closeWriters should not execute, as it may lead to a race condition
that leads to file corruption if the regionserver aborts. The execution of
closeWriters in this case would write the trailer in parallel with writer threads,
causing corruption, and then the corrupt file would get renamed and finalized
when it should not be. This corruption causes problems when the region is then
to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful,
the corrupt files should not be renamed and finalized.
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 5m 4s master passed
+1 💚 compile 4m 13s master passed
+1 💚 checkstyle 0m 53s master passed
+1 💚 spotbugs 2m 12s master passed
+1 💚 spotless 1m 12s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 46s the patch passed
+1 💚 compile 4m 32s the patch passed
+1 💚 javac 4m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 58s the patch passed
+1 💚 spotbugs 2m 18s the patch passed
+1 💚 hadoopcheck 20m 54s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
59m 58s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6266/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6266
JIRA Issue HBASE-28569
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 44f24c89b97a 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ca8c8a4
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6266/9/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 55s master passed
+1 💚 compile 1m 18s master passed
+1 💚 javadoc 0m 40s master passed
+1 💚 shadedjars 7m 22s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 47s the patch passed
+1 💚 compile 1m 6s the patch passed
+1 💚 javac 1m 6s the patch passed
+1 💚 javadoc 0m 30s the patch passed
+1 💚 shadedjars 6m 37s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 242m 53s hbase-server in the patch passed.
273m 5s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6266/9/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6266
JIRA Issue HBASE-28569
Optional Tests javac javadoc unit compile shadedjars
uname Linux e4587a01c80d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ca8c8a4
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6266/9/testReport/
Max. process+thread count 4584 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6266/9/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@aaronbee
Copy link
Contributor

aaronbee commented Apr 2, 2025

LGTM. @Apache9 Would you mind taking another look?

@Apache9
Copy link
Contributor

Apache9 commented Apr 3, 2025

Please open a PR against branch-2 too? I will merge them at once.

Thanks for the fixing! @ciacono @aaronbee

@ciacono
Copy link
Contributor Author

ciacono commented Apr 3, 2025

Thanks for reviewing @Apache9
Please see the PR against branch-2 here: #6884

@Apache9 Apache9 merged commit e2e21f1 into apache:master Apr 7, 2025
1 check passed
Apache9 pushed a commit that referenced this pull request Apr 7, 2025
…t recovered.edits (#6266)

If an exception happens in the call to finishWriterThreads in the
org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close method,
the call to closeWriters should not execute, as it may lead to a race condition
that leads to file corruption if the regionserver aborts. The execution of
closeWriters in this case would write the trailer in parallel with writer threads,
causing corruption, and then the corrupt file would get renamed and finalized
when it should not be. This corruption causes problems when the region is then
to be assigned.
To fix this, when finishWriterThreads throws an exception or is not successful,
the corrupt files should not be renamed and finalized.

Signed-off-by: Duo Zhang <zhangduo@apache.org>
(cherry picked from commit e2e21f1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants