HBASE-28801 WALs are not cleaned even after all entries are flushed #6179

kiran-maturi · 2024-08-27T06:03:32Z

No description provided.

Apache-HBase · 2024-08-27T06:43:33Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 41s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 1s		Patch does not have any anti-patterns.
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 27s		branch-2 passed
+1 💚	compile	2m 53s		branch-2 passed
+1 💚	checkstyle	0m 39s		branch-2 passed
+1 💚	spotbugs	1m 35s		branch-2 passed
+1 💚	spotless	0m 46s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 13s		the patch passed
+1 💚	compile	2m 54s		the patch passed
+1 💚	javac	2m 54s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 36s	/results-checkstyle-hbase-server.txt	hbase-server: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 💚	spotbugs	1m 43s		the patch passed
+1 💚	hadoopcheck	16m 49s		Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
-1 ❌	spotless	0m 38s		patch has 23 errors when running spotless:check, run spotless:apply to fix.
			_ Other Tests _
+1 💚	asflicense	0m 11s		The patch does not generate ASF License warnings.
		37m 57s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux eb031ed13b25 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `2a1f223`
Default Java	Eclipse Adoptium-11.0.23+9
spotless	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count	77 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T07:06:32Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 11s		Docker mode activated.
-0 ⚠️	yetus	0m 6s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	5m 5s		branch-2 passed
+1 💚	compile	1m 31s		branch-2 passed
+1 💚	javadoc	0m 50s		branch-2 passed
+1 💚	shadedjars	8m 40s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	4m 43s		the patch passed
+1 💚	compile	1m 28s		the patch passed
+1 💚	javac	1m 28s		the patch passed
+1 💚	javadoc	0m 39s		the patch passed
+1 💚	shadedjars	8m 16s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	26m 16s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		60m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 5043cc1c30df 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `2a1f223`
Default Java	Eclipse Adoptium-11.0.23+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count	1570 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T10:17:54Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
-0 ⚠️	yetus	0m 5s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 7s		branch-2 passed
+1 💚	compile	0m 57s		branch-2 passed
+1 💚	javadoc	0m 31s		branch-2 passed
+1 💚	shadedjars	5m 34s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 56s		the patch passed
+1 💚	compile	0m 57s		the patch passed
+1 💚	javac	0m 57s		the patch passed
+1 💚	javadoc	0m 29s		the patch passed
+1 💚	shadedjars	5m 31s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	227m 1s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		252m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux ade5416f6839 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `2a1f223`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count	4491 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T10:22:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 42s		Docker mode activated.
-0 ⚠️	yetus	0m 5s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	2m 38s		branch-2 passed
+1 💚	compile	0m 47s		branch-2 passed
+1 💚	javadoc	0m 28s		branch-2 passed
+1 💚	shadedjars	5m 1s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 31s		the patch passed
+1 💚	compile	0m 45s		the patch passed
+1 💚	javac	0m 45s		the patch passed
+1 💚	javadoc	0m 27s		the patch passed
+1 💚	shadedjars	5m 5s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	233m 59s		hbase-server in the patch passed.
		256m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux b77a4d8f57d3 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `2a1f223`
Default Java	Temurin-1.8.0_412-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/testReport/
Max. process+thread count	4373 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T11:00:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 45s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 7s		branch-2 passed
+1 💚	compile	2m 51s		branch-2 passed
+1 💚	checkstyle	0m 38s		branch-2 passed
+1 💚	spotbugs	1m 32s		branch-2 passed
+1 💚	spotless	0m 45s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 5s		the patch passed
+1 💚	compile	2m 51s		the patch passed
+1 💚	javac	2m 51s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 37s		the patch passed
+1 💚	spotbugs	1m 41s		the patch passed
+1 💚	hadoopcheck	16m 43s		Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
+1 💚	spotless	0m 46s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 9s		The patch does not generate ASF License warnings.
		37m 9s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 8324560a61f0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69cdb54`
Default Java	Eclipse Adoptium-11.0.23+9
Max. process+thread count	79 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T14:34:54Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 45s		Docker mode activated.
-0 ⚠️	yetus	0m 4s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	2m 41s		branch-2 passed
+1 💚	compile	0m 42s		branch-2 passed
+1 💚	javadoc	0m 24s		branch-2 passed
+1 💚	shadedjars	5m 20s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 23s		the patch passed
+1 💚	compile	0m 41s		the patch passed
+1 💚	javac	0m 41s		the patch passed
+1 💚	javadoc	0m 24s		the patch passed
+1 💚	shadedjars	5m 15s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	227m 17s		hbase-server in the patch passed.
		251m 16s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 0b3084b27d58 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69cdb54`
Default Java	Temurin-1.8.0_412-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count	4288 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T14:35:15Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 40s		Docker mode activated.
-0 ⚠️	yetus	0m 5s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 5s		branch-2 passed
+1 💚	compile	0m 58s		branch-2 passed
+1 💚	javadoc	0m 31s		branch-2 passed
+1 💚	shadedjars	5m 33s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	2m 56s		the patch passed
+1 💚	compile	0m 57s		the patch passed
+1 💚	javac	0m 57s		the patch passed
+1 💚	javadoc	0m 28s		the patch passed
+1 💚	shadedjars	5m 30s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	226m 39s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		251m 39s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 8c8fd75c39a9 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69cdb54`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count	4555 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-08-27T14:49:42Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 41s		Docker mode activated.
-0 ⚠️	yetus	0m 6s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	2m 53s		branch-2 passed
+1 💚	compile	0m 53s		branch-2 passed
+1 💚	javadoc	0m 27s		branch-2 passed
+1 💚	shadedjars	5m 32s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 0s		the patch passed
+1 💚	compile	0m 52s		the patch passed
+1 💚	javac	0m 52s		the patch passed
+1 💚	javadoc	0m 27s		the patch passed
+1 💚	shadedjars	5m 31s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	241m 5s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		265m 54s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux afb5001c2226 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `69cdb54`
Default Java	Eclipse Adoptium-11.0.23+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/testReport/
Max. process+thread count	4403 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/2/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

virajjasani · 2024-08-28T15:33:57Z

@Apache9 this is for 2.x versions. Would you like to take a look?

Apache9 · 2024-08-29T02:26:12Z

Mind explaining more here? The old logic seems correct, only if there are no un flushed entries, we can clean the WAL file...

kiran-maturi · 2024-08-29T03:10:02Z

@Apache9 I have observed in production WAL files not being cleaned for days from the roll over. This was due to the WALs not being marked for close when they had unflushed entries.

if (!isUnflushedEntries()) {
              markClosedAndClean(oldPath);
            }

During clean up in cleanOldLogs

if (!e.getValue().closed) {
        LOG.debug("{} is not closed yet, will try archiving it next time", e.getKey());
        continue;
      }

The above log line was emitted for wal files that were rolled over days ago. Once its not marked close we will not be able to clean up there by leading lot of files to be processed during SCP .
The current change marks it close. The WAL file won't be cleaned as there is check to make sure all the entries related to WAL have been flushed

Map<byte[], Long> sequenceNums = e.getValue().encodedName2HighestSequenceId;
      if (this.sequenceIdAccounting.areAllLower(sequenceNums)) {
        if (logsToArchive == null) {
          logsToArchive = new ArrayList<>();
        }
        logsToArchive.add(Pair.newPair(log, e.getValue().logSize));
        if (LOG.isTraceEnabled()) {
          LOG.trace("WAL file ready for archiving " + log);
        }
      }

My understanding is it is safe to mark them closed even with unflushed entries as there is check in cleanup which will not let the wal file to be cleaned up.

Apache9 · 2024-08-29T13:52:29Z

What I mean is that, the logic here is correct, so in general, if everything works as expected, you can not archive the file, even if you mark it as closed right? The later sequence id check should prevent you from deleting the file.

If not, as you described here, the WAL file can be deleted after you mark it as closed, then there must be something wrong, as there is some unflushed entries but the sequence id accounting tells us there is nothing unflushed?

Thanks.

kiran-maturi · 2024-08-29T23:47:41Z

@Apache9 Lets consider this scenario
At T1 wal roll over happend for WAL1 and new WAL2 has been created there are unflushed entries on the ring buffer at T1 ( highestUnsynced - highestSynced)
at T1 check sequenceId accounting for regions will also show the same there are unflushed entries for some region (this.sequenceIdAccounting.areAllLower(sequenceNums))
at T2 WAL2 gets rolled over
At T2 all the entries for WAL1 should have been flushed and we should be good to clean it up when cleanOldLogs is called
(this.sequenceIdAccounting.areAllLower(sequenceNums))
but we won't be able to clean it up as we have not marked it close
T2 - T1 > 10 mins

kiran-maturi · 2024-09-02T02:45:05Z

@Apache9 can you please review

Apache9 · 2024-09-04T13:48:02Z

So this is a problem when refactoring? IIRC we abstract some common logic to AbstractFSWAL on branch-2, but the implementation is still a bit different from master and branch-3.

We need to check the refactoring patch to see if it changed some semantics...

Apache9 · 2024-09-04T14:20:13Z

OK, in the old time, if there are unflushed entries, we will throw IOException out and abort the region server...

When optimizing the close logic, we finally changed the code to the current situation.

I think the intention here is that, if there are still unflushed entries after closing writer, there should be an exception thrown out which aborts the region server.

But looking at the implementation of getUnflushedEntriesCount, since we do not block others threads from adding new entries to the ring buffer, the getUnflushedEntriesCount could be greater than 0 even if there are no errors, so I think we need to take a look at the whole logic again, the isUnflushedEntries is not what we want now I'm afraid...

Apache9 · 2024-09-05T14:40:23Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java

@@ -390,10 +390,10 @@ protected void doReplaceWriter(Path oldPath, Path newPath, Writer nextWriter) th
          try {
            closeWriter(this.writer, oldPath, true);
          } finally {
+            // closing this as there is no other chance we can set close to true
+            // during clean up we check for unflushed entries
+            markClosedAndClean(oldPath);


Take a look at the code again, the isUnflushedEntries should work as expected. The highestUnsyncedTxid will only be increased in appendEntry method, so in the doReplaceWriter, after arriving the safe point, the highestUnsyncedTxid will not be changed.

I think there is another problem that in closeWriter, we will also check isUnflushedEntries and also increase the closeErrorCount, which may affect the logic here. But for normal case, closeWriter is executed in a background thread pool, so it is not safe to call isUnflushedEntries as it does not reflect the real state before closing... And why we put the close in background is that, even if it fails, it is not a big deal as we can make sure that all entries have already been flushed, a close failure only means we may fail to write the trailer.

I think here, we should only increase the closeErrorCount when closing in foreground, and also, when calling closeWriter, we should pass the result of isUnflushedEntries in, instead of calling it everytime when we want to check this state.

@Apache9 Thanks for reviewing.

The highestUnsyncedTxid will only be increased in appendEntry method, so in the doReplaceWriter, after arriving the safe point, the highestUnsyncedTxid will not be changed.

Is it safe to assume that the other threads (calling appendWrite) won't increment highestUnsyncedTxid till the close happens

I think here, we should only increase the closeErrorCount when closing in foreground, and also, when calling closeWriter, we should pass the result of isUnflushedEntries in, instead of calling it everytime when we want to check this state.

Yes that is correct accessing in the isUnflushedEntries in the background will cause issues

Do you plan to fix the background close issue or you just want to add some logs in this issue and file new issues for addressing the background close issue?

Apache-HBase · 2024-09-09T06:44:19Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 43s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 21s		branch-2 passed
+1 💚	compile	3m 1s		branch-2 passed
+1 💚	checkstyle	0m 39s		branch-2 passed
+1 💚	spotbugs	1m 38s		branch-2 passed
+1 💚	spotless	0m 50s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 11s		the patch passed
+1 💚	compile	3m 2s		the patch passed
+1 💚	javac	3m 2s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 37s		the patch passed
+1 💚	spotbugs	1m 44s		the patch passed
+1 💚	hadoopcheck	17m 0s		Patch does not cause any errors with Hadoop 2.10.2 or 3.3.6 3.4.0.
+1 💚	spotless	0m 46s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 11s		The patch does not generate ASF License warnings.
		38m 32s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 07e61a1813b3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `17ce08c`
Default Java	Eclipse Adoptium-11.0.23+9
Max. process+thread count	78 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-09-09T10:34:49Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	4m 5s		Docker mode activated.
-0 ⚠️	yetus	0m 5s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 0s		branch-2 passed
+1 💚	compile	0m 53s		branch-2 passed
+1 💚	javadoc	0m 28s		branch-2 passed
+1 💚	shadedjars	5m 30s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 2s		the patch passed
+1 💚	compile	0m 52s		the patch passed
+1 💚	javac	0m 52s		the patch passed
+1 💚	javadoc	0m 28s		the patch passed
+1 💚	shadedjars	5m 28s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
-1 ❌	unit	240m 36s	/patch-unit-hbase-server.txt	hbase-server in the patch failed.
		269m 3s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux ac319b866957 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `17ce08c`
Default Java	Eclipse Adoptium-11.0.23+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/testReport/
Max. process+thread count	4375 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-09-09T10:36:41Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	3m 31s		Docker mode activated.
-0 ⚠️	yetus	0m 5s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ branch-2 Compile Tests _
+1 💚	mvninstall	3m 0s		branch-2 passed
+1 💚	compile	0m 58s		branch-2 passed
+1 💚	javadoc	0m 30s		branch-2 passed
+1 💚	shadedjars	5m 27s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 5s		the patch passed
+1 💚	compile	0m 57s		the patch passed
+1 💚	javac	0m 57s		the patch passed
+1 💚	javadoc	0m 29s		the patch passed
+1 💚	shadedjars	5m 26s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	242m 45s		hbase-server in the patch passed.
		270m 58s

Subsystem	Report/Notes
Docker	ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6179
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux b715a7143e2a 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / `17ce08c`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/testReport/
Max. process+thread count	4164 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6179/3/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

ranganathg · 2024-09-10T06:26:01Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java

@@ -393,6 +393,8 @@ protected void doReplaceWriter(Path oldPath, Path newPath, Writer nextWriter) th
            inflightWALClosures.remove(oldPath.getName());
            if (!isUnflushedEntries()) {
              markClosedAndClean(oldPath);
+            } else {
+              LOG.debug("WAL has unflushed entries path: " + oldPath);


Should you add some metrics to track this event? Is this PR about adding logs to help debugging? is there more to the fix?

HBASE-28801 WALs are not cleaned even after all entries are flushed

2a1f223

spotless fixes

69cdb54

virajjasani approved these changes Aug 28, 2024

View reviewed changes

virajjasani requested a review from Apache9 August 28, 2024 15:33

Apache9 reviewed Sep 5, 2024

View reviewed changes

added logging for files with unflushed entries

17ce08c

ranganathg reviewed Sep 10, 2024

View reviewed changes

HBASE-28801 WALs are not cleaned even after all entries are flushed #6179

Are you sure you want to change the base?

HBASE-28801 WALs are not cleaned even after all entries are flushed #6179

Conversation

kiran-maturi commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

Apache-HBase commented Aug 27, 2024

Uh oh!

virajjasani commented Aug 28, 2024

Uh oh!

Apache9 commented Aug 29, 2024

Uh oh!

kiran-maturi commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Apache9 commented Aug 29, 2024

Uh oh!

kiran-maturi commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiran-maturi commented Sep 2, 2024

Uh oh!

Apache9 commented Sep 4, 2024

Uh oh!

Apache9 commented Sep 4, 2024

Uh oh!

Apache9 Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kiran-maturi Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache9 Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Sep 9, 2024

Uh oh!

Apache-HBase commented Sep 9, 2024

Uh oh!

Apache-HBase commented Sep 9, 2024

Uh oh!

ranganathg Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kiran-maturi commented Aug 29, 2024 •

edited

Loading

kiran-maturi commented Aug 29, 2024 •

edited

Loading

kiran-maturi Sep 9, 2024 •

edited

Loading