-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29716 Include sequence ID metadata for HFiles generated by incremental backups #7480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
7317c4d to
94750f1
Compare
94750f1 to
45234b1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| private void close(final StoreFileWriter w) throws IOException { | ||
| private void close(final StoreFileWriter w, final WriterInfo wl) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, mind changing wl to wi here to match the rest of the patch?
| wl.written += length; | ||
| wi.writer.append((ExtendedCell) kv); | ||
| wi.written += length; | ||
| wi.maxSequenceId = Math.max(kv.getSequenceId(), wi.maxSequenceId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns that Cell#getSequenceId is removed in HBase 3? Any plans for how we should handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, as long as this is an ExtendedCell looks like this should be possible in branch-3
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
The HFiles generated by incremental backups cannot be properly read by tooling such as the ClientSideRequestScanner, because the generated HFiles do not include the MAX_SEQ_ID metadata. The scanner will ignore cell-level sequence IDs and instead sort the HFiles arbitrarily. This causes incorrect results when scanning overwrites to cells with the same timestamp.
This PR adds a new option to the HFileOutputFormat2 that will calculate and set the required metadata. This only really effects the ClientSideRequestScanner, as the sequence ID will be recalculated when bulk-loaded anyways.
Part of https://issues.apache.org/jira/browse/HBASE-29716