-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store DB identity and DB session ID in SST files #6983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@gg814 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@gg814 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@gg814 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Thanks for the work! Did you run tools/check_format_compatible.sh to check the SST format compatibility? |
@@ -237,12 +237,17 @@ Status SstFileWriter::Open(const std::string& file_path) { | |||
r->column_family_name = ""; | |||
cf_id = TablePropertiesCollectorFactory::Context::kUnknownColumnFamily; | |||
} | |||
|
|||
std::string db_session_id = r->ioptions.env->GenerateUniqueId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very clear here. Assume I'm an application C, C opens a DB and the DB get a session ID. After some time, C calls SstFileWriter and dump out an SST file from the DB. Based on this line std::string db_session_id = r->ioptions.env->GenerateUniqueId();
, the generated SST file's db_session_id will be different from the DB that C opens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think there is no way for SstFileWriter to reference back any DB because SstFileWriter is used to create SST files to be added to some DB later. (Please correct me if I'm misunderstood something :)
So here, in this way, we ensure that every time one opens SstFileWriter, it behaves like one opens a "DB".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to explain this special case in the comment for user to understand.
@@ -193,6 +195,15 @@ struct TableProperties { | |||
// Actual SST file creation time. 0 means unknown. | |||
uint64_t file_creation_time = 0; | |||
|
|||
// DB identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add one or two sentences to explain the identity and session id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
Will run tools/check_format_compatible.sh to check the SST format compatibility. (did not know it exists :) |
HISTORY.md
Outdated
* `DB::GetDbSessionId(std::string& session_id)` is added. `session_id` stores a unique identifier that gets reset every time the DB is opened. This DB session ID should be unique among all open DB instances on all hosts, and should be unique among re-openings of the same or other DBs. This identifier is recorded in the `LOG` file on the line starting with `DB Session ID:`. | ||
|
||
### New Features | ||
* DB identity (`db_id`) and DB session identity (`db_session_id`) are added to table properties and stored in SST files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also mention the special case of SstFileWriter in HISTORY.md, so people can be aware of the db_session_id of files being generated from DB is different from those generated from SstFileWriter.
Summary: Added DB::GetDbSessionId by using the same format and machinery as DB::GetDbIdentity. The DB Session ID is generated (and therefore, updated) each time a DB object is opened. It is written to the LOG file right after the line of “DB SUMMARY”. A test for the uniqueness, for different openings and during the same opening, is also added. Test plan: Passed make check
Summary: `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`. In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`. A table property test is added. Test plan: make check and some manual tests, also ran make check with COMPILE_WITH_ASAN=1 and /tool/check_format_compatible.sh
@gg814 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Added a few more comments. |
Awesome. The only other thing I would like to see is how much additional space we have added to SST files with this metadata. I'd be satisfied with one simple before-and-after comparison when both a full DB id and session id are stored. |
The difference is 99 bytes. The details are as below. I created two DBs (db_noids and db_ids) with identical contents:
|
You can also add the space overhead conclusion to the PR summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the contribution!
Great, for 64MB default target_file_size_base, that's one-to-two millionths additional space usage. :) |
Summary:
db_id
anddb_session_id
are now part of the table properties for all formats and stored in SST files.The
TablePropertiesNames
for these two identifiers arerocksdb.creating.db.identity
androcksdb.creating.session.identity
.In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as
DB::GetDbSessionId
.A table property test is added.
The storage overhead of these two identities in an SST file is 99 bytes: 36 bytes for each of the two identifiers and 27 bytes for "DB identity: " and "DB session identity: ".
Test plan:
make check and some manual tests, also ran make check with COMPILE_WITH_ASAN=1 and /tool/check_format_compatible.sh