HBASE-27354 EOF thrown by WALEntryStream causes replication blocking #4756

ddupg · 2022-09-01T11:24:39Z

In WALEntryStream#readNextEntryAndRecordReaderPosition, it is possible that we read uncommitted data. If we read beyond the committed file length, then reopen inputStream and seek back.

In our use, we found that the position where seek back may be exactly the length of the file being written, which may cause EOF. The thrown EOF is finally caught by ReplicationSourceWALReader.run, but totalBufferUsed is not cleanup up.

After a long run, all peers will go slow and eventually block completely.

Apache-HBase · 2022-09-01T12:06:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 48s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+1 💚	mvninstall	3m 25s	master passed
+1 💚	compile	2m 50s	master passed
+1 💚	checkstyle	0m 34s	master passed
+1 💚	spotless	0m 45s	branch has no errors when running spotless:check.
+1 💚	spotbugs	1m 36s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 1s	the patch passed
+1 💚	compile	2m 45s	the patch passed
+1 💚	javac	2m 45s	the patch passed
+1 💚	checkstyle	0m 31s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	11m 50s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 52s	patch has no errors when running spotless:check.
+1 💚	spotbugs	2m 4s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 13s	The patch does not generate ASF License warnings.
		39m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#4756
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 81bb2deb506b 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `308cd72`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count	64 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/console
versions	git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2022-09-01T15:03:10Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 24s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	2m 50s	master passed
+1 💚	compile	0m 48s	master passed
+1 💚	shadedjars	3m 48s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 29s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 34s	the patch passed
+1 💚	compile	0m 46s	the patch passed
+1 💚	javac	0m 46s	the patch passed
+1 💚	shadedjars	3m 45s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 26s	the patch passed
		_ Other Tests _
+1 💚	unit	198m 49s	hbase-server in the patch passed.
		216m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#4756
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux ecf1ce559af5 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `308cd72`
Default Java	AdoptOpenJDK-11.0.10+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/testReport/
Max. process+thread count	2622 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2022-09-01T15:23:27Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 46s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+1 💚	mvninstall	3m 29s	master passed
+1 💚	compile	0m 48s	master passed
+1 💚	shadedjars	4m 32s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 34s	master passed
		_ Patch Compile Tests _
+1 💚	mvninstall	3m 10s	the patch passed
+1 💚	compile	0m 49s	the patch passed
+1 💚	javac	0m 49s	the patch passed
+1 💚	shadedjars	4m 25s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 24s	the patch passed
		_ Other Tests _
+1 💚	unit	214m 42s	hbase-server in the patch passed.
		236m 48s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#4756
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 5b90b302641b 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `308cd72`
Default Java	AdoptOpenJDK-1.8.0_282-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/testReport/
Max. process+thread count	2330 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4756/1/console
versions	git=2.17.1 maven=3.6.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache9 · 2022-09-02T13:59:00Z

hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java

@@ -255,15 +255,21 @@ private void dequeueCurrentLog() throws IOException {
   * Returns whether the file is opened for writing.
   */
  private boolean readNextEntryAndRecordReaderPosition() throws IOException {
+    long prePos = reader.getPosition();


I think this is a nice improvement that we can return earlier when there is no data can be read. But I do not get the point that why here if we try to read more data can lead to a problem? We could still hit EOFException when the remaining data is not enough for a whole entry?

Thanks for reviewing.
I haven't found out yet that we read an incomplete entry. EOFException is thrown by org.apache.hadoop.fs.FSDataInputStream#seek.
After reading the uncommitted data, we will reopen FSDataInputStream and seek back. If the position where seek back is the sync length, it may cause EOF. As the 3 copies of the hdfs block being written are inconsistent, even if we can read the data from a specific position in one copy, it may not be read in the other one.
It's not a big problem if it just throws EOF as we'll retry. The big problem is that if we have read some entries into the WALEntryBatch and increased the totalBufferUsed, and the totalBufferUsed is not subtracted after throwing EOF, all peers will eventually block completely.

It's not a big problem if it just throws EOF as we'll retry. The big problem is that if we have read some entries into the WALEntryBatch and increased the totalBufferUsed, and the totalBufferUsed is not subtracted after throwing EOF, all peers will eventually block completely.

One of our customers seems to be consistently reaching this EOF problem, per below exception trace.

WARN org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Encountered a malformed edit, seeking back to last good position in file, from 31912404 to 31912265 java.io.EOFException: EOF while reading 106 WAL KVs; started reading at 31912338 and read up to 31912404 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:397) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:86) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:262) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)

Then I do not think the fix here is enough? We should decrease the totalBufferUsed when calling resetReader?

Checking the code, we have a special logic in ReplicationSourceWALReader for dealing with EOFException. Please see ReplicationSourceWALReader.handleEofException.

Thanks @wchevreuil for reviewing and sharing the case. This looks like it reads an incomplete entry?

Then I do not think the fix here is enough?

As @Apache9 said, this fix may not enough, it may not cover that reading incomplete entry caused by like RS down. I'm digging deeper into the root cause.

@wchevreuil for the above stacktrace you mentioned, which version the cluster is running on?

FYI @apurtell

One of our customers seems to be consistently reaching this EOF problem, per below exception trace.

WARN org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Encountered a malformed edit, seeking back to last good position in file, from 31912404 to 31912265 java.io.EOFException: EOF while reading 106 WAL KVs; started reading at 31912338 and read up to 31912404 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:397) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:86) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:262) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)

@wchevreuil, this problem seems not throw EOFException in WALEntryStream.readNextEntryAndRecordReaderPosition, it may be another problem?

wchevreuil · 2022-09-06T12:23:38Z

hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java

@@ -255,15 +255,21 @@ private void dequeueCurrentLog() throws IOException {
   * Returns whether the file is opened for writing.
   */
  private boolean readNextEntryAndRecordReaderPosition() throws IOException {
+    long prePos = reader.getPosition();


It's not a big problem if it just throws EOF as we'll retry. The big problem is that if we have read some entries into the WALEntryBatch and increased the totalBufferUsed, and the totalBufferUsed is not subtracted after throwing EOF, all peers will eventually block completely.

One of our customers seems to be consistently reaching this EOF problem, per below exception trace.

WARN org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Encountered a malformed edit, seeking back to last good position in file, from 31912404 to 31912265 java.io.EOFException: EOF while reading 106 WAL KVs; started reading at 31912338 and read up to 31912404 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:397) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:86) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:262) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)

virajjasani · 2022-09-29T04:30:35Z

After a long run, all peers will go slow and eventually block completely.

@ddupg curious if you have any workaround for this on live cluster? do you restart regionservers?

Apache9 · 2022-10-08T11:03:42Z

Any updates here @ddupg ? If not, I think we could open a new issue to land the improvement here first, no harm.

Thanks.

ddupg · 2022-11-09T06:50:49Z

@ddupg curious if you have any workaround for this on live cluster? do you restart regionservers?

@virajjasani Previously we had to restart regionservers, after this patch, the problem still exist, but the frequency has slowed to an acceptable level.

HBASE-27354 EOF thrown by WALEntryStream causes replication blocking

cdbef49

Apache9 reviewed Sep 2, 2022

View reviewed changes

wchevreuil approved these changes Sep 6, 2022

View reviewed changes

Apache9 mentioned this pull request Sep 29, 2022

HBASE-27398 Remove dumping of EOFException while reading WAL with ProtobufLogReader #4806

Closed

HBASE-27354 EOF thrown by WALEntryStream causes replication blocking #4756

Are you sure you want to change the base?

HBASE-27354 EOF thrown by WALEntryStream causes replication blocking #4756

Conversation

ddupg commented Sep 1, 2022

Uh oh!

Apache-HBase commented Sep 1, 2022

Uh oh!

Apache-HBase commented Sep 1, 2022

Uh oh!

Apache-HBase commented Sep 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virajjasani commented Sep 29, 2022

Uh oh!

Apache9 commented Oct 8, 2022

Uh oh!

ddupg commented Nov 9, 2022

Uh oh!

Uh oh!