Skip to content

HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,14 @@ protected static int getIndex(TableName tbl, List<TableName> sTableList) {

/*
* Reads bulk load records from backup table, iterates through the records and forms the paths for
* bulk loaded hfiles. Copies the bulk loaded hfiles to backup destination
* bulk loaded hfiles. Copies the bulk loaded hfiles to backup destination. This method does NOT
* clean up the entries in the bulk load system table. Those entries should not be cleaned until
* the backup is marked as complete.
* @param sTableList list of tables to be backed up
* @return map of table to List of files
* @return the rowkeys of bulk loaded files
*/
@SuppressWarnings("unchecked")
protected Map<byte[], List<Path>>[] handleBulkLoad(List<TableName> sTableList)
throws IOException {
protected List<byte[]> handleBulkLoad(List<TableName> sTableList) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay to drop the table context of the rowkeys in the returned value? a rowkey is only meaningful in the context of its table (or region).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this a bit offline. We can purge these rowkeys because they are only returned by handleBulkload if we have bulk loaded the keys in this backup.

Right now an inopportune failure would result in us missing bulk load data on subsequent incremental backups, but with this change an inopportune failure would result is us backing up duplicative files which should be just a little bit wasteful, but otherwise innocuous

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having some backup consistency check that can detect and purge extra files? Or do we think that backups will cycle out and the redundancy will be dropped the next time a full backup is taken?

Map<byte[], List<Path>>[] mapForSrc = new Map[sTableList.size()];
List<String> activeFiles = new ArrayList<>();
List<String> archiveFiles = new ArrayList<>();
Expand Down Expand Up @@ -191,8 +192,8 @@ protected Map<byte[], List<Path>>[] handleBulkLoad(List<TableName> sTableList)
}

copyBulkLoadedFiles(activeFiles, archiveFiles);
backupManager.deleteBulkLoadedRows(pair.getSecond());
return mapForSrc;

return pair.getSecond();
}

private void copyBulkLoadedFiles(List<String> activeFiles, List<String> archiveFiles)
Expand Down Expand Up @@ -308,10 +309,12 @@ public void execute() throws IOException {
BackupUtils.getMinValue(BackupUtils.getRSLogTimestampMins(newTableSetTimestampMap));
backupManager.writeBackupStartCode(newStartCode);

handleBulkLoad(backupInfo.getTableNames());
List<byte[]> bulkLoadedRows = handleBulkLoad(backupInfo.getTableNames());

// backup complete
completeBackup(conn, backupInfo, BackupType.INCREMENTAL, conf);

backupManager.deleteBulkLoadedRows(bulkLoadedRows);
} catch (IOException e) {
failBackup(conn, backupInfo, backupManager, e, "Unexpected Exception : ",
BackupType.INCREMENTAL, conf);
Expand Down