Skip to content

HBASE-28801 WALs are not cleaned even after all entries are flushed #6179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: branch-2
Choose a base branch
from

Conversation

kiran-maturi
Copy link
Contributor

No description provided.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 1s Patch does not have any anti-patterns.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 27s branch-2 passed
+1 💚 compile 2m 53s branch-2 passed
+1 💚 checkstyle 0m 39s branch-2 passed
+1 💚 spotbugs 1m 35s branch-2 passed
+1 💚 spotless 0m 46s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 2m 54s the patch passed
+1 💚 javac 2m 54s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 36s /results-checkstyle-hbase-server.txt hbase-server: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚 spotbugs 1m 43s the patch passed
+1 💚 hadoopcheck 16m 49s Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
-1 ❌ spotless 0m 38s patch has 23 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
37m 57s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6179
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux eb031ed13b25 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 2a1f223
Default Java Eclipse Adoptium-11.0.23+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 77 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 11s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 5m 5s branch-2 passed
+1 💚 compile 1m 31s branch-2 passed
+1 💚 javadoc 0m 50s branch-2 passed
+1 💚 shadedjars 8m 40s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 43s the patch passed
+1 💚 compile 1m 28s the patch passed
+1 💚 javac 1m 28s the patch passed
+1 💚 javadoc 0m 39s the patch passed
+1 💚 shadedjars 8m 16s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 26m 16s /patch-unit-hbase-server.txt hbase-server in the patch failed.
60m 52s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5043cc1c30df 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 2a1f223
Default Java Eclipse Adoptium-11.0.23+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count 1570 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 7s branch-2 passed
+1 💚 compile 0m 57s branch-2 passed
+1 💚 javadoc 0m 31s branch-2 passed
+1 💚 shadedjars 5m 34s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 56s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 29s the patch passed
+1 💚 shadedjars 5m 31s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 227m 1s /patch-unit-hbase-server.txt hbase-server in the patch failed.
252m 21s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux ade5416f6839 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 2a1f223
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count 4491 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 2m 38s branch-2 passed
+1 💚 compile 0m 47s branch-2 passed
+1 💚 javadoc 0m 28s branch-2 passed
+1 💚 shadedjars 5m 1s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 31s the patch passed
+1 💚 compile 0m 45s the patch passed
+1 💚 javac 0m 45s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 5m 5s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 233m 59s hbase-server in the patch passed.
256m 49s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux b77a4d8f57d3 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 2a1f223
Default Java Temurin-1.8.0_412-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count 4373 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 7s branch-2 passed
+1 💚 compile 2m 51s branch-2 passed
+1 💚 checkstyle 0m 38s branch-2 passed
+1 💚 spotbugs 1m 32s branch-2 passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 5s the patch passed
+1 💚 compile 2m 51s the patch passed
+1 💚 javac 2m 51s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 37s the patch passed
+1 💚 spotbugs 1m 41s the patch passed
+1 💚 hadoopcheck 16m 43s Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
+1 💚 spotless 0m 46s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
37m 9s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6179
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 8324560a61f0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69cdb54
Default Java Eclipse Adoptium-11.0.23+9
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 2m 41s branch-2 passed
+1 💚 compile 0m 42s branch-2 passed
+1 💚 javadoc 0m 24s branch-2 passed
+1 💚 shadedjars 5m 20s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 23s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 javadoc 0m 24s the patch passed
+1 💚 shadedjars 5m 15s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 227m 17s hbase-server in the patch passed.
251m 16s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux 0b3084b27d58 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69cdb54
Default Java Temurin-1.8.0_412-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count 4288 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 5s branch-2 passed
+1 💚 compile 0m 58s branch-2 passed
+1 💚 javadoc 0m 31s branch-2 passed
+1 💚 shadedjars 5m 33s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 56s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 28s the patch passed
+1 💚 shadedjars 5m 30s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 226m 39s /patch-unit-hbase-server.txt hbase-server in the patch failed.
251m 39s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux 8c8fd75c39a9 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69cdb54
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count 4555 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 2m 53s branch-2 passed
+1 💚 compile 0m 53s branch-2 passed
+1 💚 javadoc 0m 27s branch-2 passed
+1 💚 shadedjars 5m 32s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 0s the patch passed
+1 💚 compile 0m 52s the patch passed
+1 💚 javac 0m 52s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 5m 31s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 241m 5s /patch-unit-hbase-server.txt hbase-server in the patch failed.
265m 54s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux afb5001c2226 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 69cdb54
Default Java Eclipse Adoptium-11.0.23+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count 4403 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani requested a review from Apache9 August 28, 2024 15:33
@virajjasani
Copy link
Contributor

@Apache9 this is for 2.x versions. Would you like to take a look?

@Apache9
Copy link
Contributor

Apache9 commented Aug 29, 2024

Mind explaining more here? The old logic seems correct, only if there are no un flushed entries, we can clean the WAL file...

@kiran-maturi
Copy link
Contributor Author

kiran-maturi commented Aug 29, 2024

@Apache9 I have observed in production WAL files not being cleaned for days from the roll over. This was due to the WALs not being marked for close when they had unflushed entries.

if (!isUnflushedEntries()) {
              markClosedAndClean(oldPath);
            }

During clean up in cleanOldLogs

if (!e.getValue().closed) {
        LOG.debug("{} is not closed yet, will try archiving it next time", e.getKey());
        continue;
      }

The above log line was emitted for wal files that were rolled over days ago. Once its not marked close we will not be able to clean up there by leading lot of files to be processed during SCP .
The current change marks it close. The WAL file won't be cleaned as there is check to make sure all the entries related to WAL have been flushed

Map<byte[], Long> sequenceNums = e.getValue().encodedName2HighestSequenceId;
      if (this.sequenceIdAccounting.areAllLower(sequenceNums)) {
        if (logsToArchive == null) {
          logsToArchive = new ArrayList<>();
        }
        logsToArchive.add(Pair.newPair(log, e.getValue().logSize));
        if (LOG.isTraceEnabled()) {
          LOG.trace("WAL file ready for archiving " + log);
        }
      }

My understanding is it is safe to mark them closed even with unflushed entries as there is check in cleanup which will not let the wal file to be cleaned up.

@Apache9
Copy link
Contributor

Apache9 commented Aug 29, 2024

What I mean is that, the logic here is correct, so in general, if everything works as expected, you can not archive the file, even if you mark it as closed right? The later sequence id check should prevent you from deleting the file.

If not, as you described here, the WAL file can be deleted after you mark it as closed, then there must be something wrong, as there is some unflushed entries but the sequence id accounting tells us there is nothing unflushed?

Thanks.

@kiran-maturi
Copy link
Contributor Author

kiran-maturi commented Aug 29, 2024

@Apache9 Lets consider this scenario
At T1 wal roll over happend for WAL1 and new WAL2 has been created there are unflushed entries on the ring buffer at T1 ( highestUnsynced - highestSynced)
at T1 check sequenceId accounting for regions will also show the same there are unflushed entries for some region (this.sequenceIdAccounting.areAllLower(sequenceNums))
at T2 WAL2 gets rolled over
At T2 all the entries for WAL1 should have been flushed and we should be good to clean it up when cleanOldLogs is called
(this.sequenceIdAccounting.areAllLower(sequenceNums))
but we won't be able to clean it up as we have not marked it close
T2 - T1 > 10 mins

@kiran-maturi
Copy link
Contributor Author

@Apache9 can you please review

@Apache9
Copy link
Contributor

Apache9 commented Sep 4, 2024

So this is a problem when refactoring? IIRC we abstract some common logic to AbstractFSWAL on branch-2, but the implementation is still a bit different from master and branch-3.

We need to check the refactoring patch to see if it changed some semantics...

@Apache9
Copy link
Contributor

Apache9 commented Sep 4, 2024

OK, in the old time, if there are unflushed entries, we will throw IOException out and abort the region server...

When optimizing the close logic, we finally changed the code to the current situation.

I think the intention here is that, if there are still unflushed entries after closing writer, there should be an exception thrown out which aborts the region server.

But looking at the implementation of getUnflushedEntriesCount, since we do not block others threads from adding new entries to the ring buffer, the getUnflushedEntriesCount could be greater than 0 even if there are no errors, so I think we need to take a look at the whole logic again, the isUnflushedEntries is not what we want now I'm afraid...

@@ -390,10 +390,10 @@ protected void doReplaceWriter(Path oldPath, Path newPath, Writer nextWriter) th
try {
closeWriter(this.writer, oldPath, true);
} finally {
// closing this as there is no other chance we can set close to true
// during clean up we check for unflushed entries
markClosedAndClean(oldPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the code again, the isUnflushedEntries should work as expected. The highestUnsyncedTxid will only be increased in appendEntry method, so in the doReplaceWriter, after arriving the safe point, the highestUnsyncedTxid will not be changed.

I think there is another problem that in closeWriter, we will also check isUnflushedEntries and also increase the closeErrorCount, which may affect the logic here. But for normal case, closeWriter is executed in a background thread pool, so it is not safe to call isUnflushedEntries as it does not reflect the real state before closing... And why we put the close in background is that, even if it fails, it is not a big deal as we can make sure that all entries have already been flushed, a close failure only means we may fail to write the trailer.

I think here, we should only increase the closeErrorCount when closing in foreground, and also, when calling closeWriter, we should pass the result of isUnflushedEntries in, instead of calling it everytime when we want to check this state.

Copy link
Contributor Author

@kiran-maturi kiran-maturi Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 Thanks for reviewing.

The highestUnsyncedTxid will only be increased in appendEntry method, so in the doReplaceWriter, after arriving the safe point, the highestUnsyncedTxid will not be changed.

Is it safe to assume that the other threads (calling appendWrite) won't increment highestUnsyncedTxid till the close happens

I think here, we should only increase the closeErrorCount when closing in foreground, and also, when calling closeWriter, we should pass the result of isUnflushedEntries in, instead of calling it everytime when we want to check this state.

Yes that is correct accessing in the isUnflushedEntries in the background will cause issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to fix the background close issue or you just want to add some logs in this issue and file new issues for addressing the background close issue?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 21s branch-2 passed
+1 💚 compile 3m 1s branch-2 passed
+1 💚 checkstyle 0m 39s branch-2 passed
+1 💚 spotbugs 1m 38s branch-2 passed
+1 💚 spotless 0m 50s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 11s the patch passed
+1 💚 compile 3m 2s the patch passed
+1 💚 javac 3m 2s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 37s the patch passed
+1 💚 spotbugs 1m 44s the patch passed
+1 💚 hadoopcheck 17m 0s Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
+1 💚 spotless 0m 46s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
38m 32s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6179
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 07e61a1813b3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 17ce08c
Default Java Eclipse Adoptium-11.0.23+9
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 4m 5s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 0s branch-2 passed
+1 💚 compile 0m 53s branch-2 passed
+1 💚 javadoc 0m 28s branch-2 passed
+1 💚 shadedjars 5m 30s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 52s the patch passed
+1 💚 javac 0m 52s the patch passed
+1 💚 javadoc 0m 28s the patch passed
+1 💚 shadedjars 5m 28s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 240m 36s /patch-unit-hbase-server.txt hbase-server in the patch failed.
269m 3s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux ac319b866957 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 17ce08c
Default Java Eclipse Adoptium-11.0.23+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/testReport/
Max. process+thread count 4375 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 3m 31s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ branch-2 Compile Tests _
+1 💚 mvninstall 3m 0s branch-2 passed
+1 💚 compile 0m 58s branch-2 passed
+1 💚 javadoc 0m 30s branch-2 passed
+1 💚 shadedjars 5m 27s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 5s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 29s the patch passed
+1 💚 shadedjars 5m 26s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 242m 45s hbase-server in the patch passed.
270m 58s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6179
Optional Tests javac javadoc unit compile shadedjars
uname Linux b715a7143e2a 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2 / 17ce08c
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/testReport/
Max. process+thread count 4164 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@@ -393,6 +393,8 @@ protected void doReplaceWriter(Path oldPath, Path newPath, Writer nextWriter) th
inflightWALClosures.remove(oldPath.getName());
if (!isUnflushedEntries()) {
markClosedAndClean(oldPath);
} else {
LOG.debug("WAL has unflushed entries path: " + oldPath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add some metrics to track this event? Is this PR about adding logs to help debugging? is there more to the fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants