Skip to content

Conversation

@krconv
Copy link

@krconv krconv commented Nov 25, 2025

The HFiles generated by incremental backups cannot be properly read by tooling such as the ClientSideRequestScanner, because the generated HFiles do not include the MAX_SEQ_ID metadata. The scanner will ignore cell-level sequence IDs and instead sort the HFiles arbitrarily. This causes incorrect results when scanning overwrites to cells with the same timestamp.

This PR adds a new option to the HFileOutputFormat2 that will calculate and set the required metadata. This only really effects the ClientSideRequestScanner, as the sequence ID will be recalculated when bulk-loaded anyways.

Part of https://issues.apache.org/jira/browse/HBASE-29716

Upstream PR: apache#7480

@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option-2.6 branch from daf2dff to 046cb69 Compare November 25, 2025 02:13
@krconv krconv requested review from hgromer and sidkhillon November 25, 2025 10:31
@hgromer
Copy link
Collaborator

hgromer commented Nov 25, 2025

Looks good for HS, aside from a minor comment I left here

@krconv krconv merged commit ad62e7f into hubspot-2.6 Nov 25, 2025
1 check passed
@krconv krconv deleted the HBASE-29716-set-sequence-id-option-2.6 branch November 25, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants