Skip to content

Conversation

@krconv
Copy link

@krconv krconv commented Nov 24, 2025

The HFiles generated by incremental backups cannot be properly read by tooling such as the ClientSideRequestScanner, because the generated HFiles do not include the MAX_SEQ_ID metadata. The scanner will ignore cell-level sequence IDs and instead sort the HFiles arbitrarily. This causes incorrect results when scanning overwrites to cells with the same timestamp.

This PR adds a new option to the HFileOutputFormat2 that will calculate and set the required metadata. This only really effects the ClientSideRequestScanner, as the sequence ID will be recalculated when bulk-loaded anyways.

Part of https://issues.apache.org/jira/browse/HBASE-29716

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option branch from 7317c4d to 94750f1 Compare November 25, 2025 02:10
@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option branch from 94750f1 to 45234b1 Compare November 25, 2025 02:13
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

}

private void close(final StoreFileWriter w) throws IOException {
private void close(final StoreFileWriter w, final WriterInfo wl) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, mind changing wl to wi here to match the rest of the patch?

wl.written += length;
wi.writer.append((ExtendedCell) kv);
wi.written += length;
wi.maxSequenceId = Math.max(kv.getSequenceId(), wi.maxSequenceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns that Cell#getSequenceId is removed in HBase 3? Any plans for how we should handle that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, as long as this is an ExtendedCell looks like this should be possible in branch-3

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 8s master passed
+1 💚 compile 1m 7s master passed
+1 💚 checkstyle 0m 24s master passed
+1 💚 spotbugs 0m 59s master passed
+1 💚 spotless 0m 46s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 compile 1m 6s the patch passed
+1 💚 javac 1m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 24s the patch passed
+1 💚 spotbugs 1m 13s the patch passed
+1 💚 hadoopcheck 12m 3s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
-1 ❌ spotless 0m 41s patch has 21 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
34m 5s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7480
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 40d18c1db355 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d440969
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 3m 39s master passed
+1 💚 compile 0m 40s master passed
+1 💚 javadoc 0m 30s master passed
+1 💚 shadedjars 6m 19s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 3m 11s the patch passed
+1 💚 compile 0m 40s the patch passed
+1 💚 javac 0m 40s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 13s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 17m 43s hbase-mapreduce in the patch passed.
+1 💚 unit 10m 27s hbase-backup in the patch passed.
52m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7480
Optional Tests javac javadoc unit compile shadedjars
uname Linux 9d35dab9ffa4 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / d440969
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/testReport/
Max. process+thread count 3260 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants