Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store DB identity and DB session ID in SST files #6983

Closed
wants to merge 2 commits into from

Conversation

gg814
Copy link
Contributor

@gg814 gg814 commented Jun 15, 2020

Summary:
db_id and db_session_id are now part of the table properties for all formats and stored in SST files.

The TablePropertiesNames for these two identifiers are rocksdb.creating.db.identity and rocksdb.creating.session.identity.

In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as DB::GetDbSessionId.

A table property test is added.

The storage overhead of these two identities in an SST file is 99 bytes: 36 bytes for each of the two identifiers and 27 bytes for "DB identity: " and "DB session identity: ".

Test plan:
make check and some manual tests, also ran make check with COMPILE_WITH_ASAN=1 and /tool/check_format_compatible.sh

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gg814 gg814 force-pushed the store_ids_in_sst branch from a9b52b9 to 2602cbc Compare June 15, 2020 18:56
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gg814 gg814 force-pushed the store_ids_in_sst branch from 2602cbc to bc2bd47 Compare June 15, 2020 21:47
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gg814 gg814 force-pushed the store_ids_in_sst branch from bc2bd47 to edd061d Compare June 16, 2020 12:39
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zhichao-cao
Copy link
Contributor

Thanks for the work! Did you run tools/check_format_compatible.sh to check the SST format compatibility?

@@ -237,12 +237,17 @@ Status SstFileWriter::Open(const std::string& file_path) {
r->column_family_name = "";
cf_id = TablePropertiesCollectorFactory::Context::kUnknownColumnFamily;
}

std::string db_session_id = r->ioptions.env->GenerateUniqueId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very clear here. Assume I'm an application C, C opens a DB and the DB get a session ID. After some time, C calls SstFileWriter and dump out an SST file from the DB. Based on this line std::string db_session_id = r->ioptions.env->GenerateUniqueId();, the generated SST file's db_session_id will be different from the DB that C opens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think there is no way for SstFileWriter to reference back any DB because SstFileWriter is used to create SST files to be added to some DB later. (Please correct me if I'm misunderstood something :)

So here, in this way, we ensure that every time one opens SstFileWriter, it behaves like one opens a "DB".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to explain this special case in the comment for user to understand.

@@ -193,6 +195,15 @@ struct TableProperties {
// Actual SST file creation time. 0 means unknown.
uint64_t file_creation_time = 0;

// DB identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one or two sentences to explain the identity and session id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@gg814
Copy link
Contributor Author

gg814 commented Jun 16, 2020

Will run tools/check_format_compatible.sh to check the SST format compatibility. (did not know it exists :)

HISTORY.md Outdated
* `DB::GetDbSessionId(std::string& session_id)` is added. `session_id` stores a unique identifier that gets reset every time the DB is opened. This DB session ID should be unique among all open DB instances on all hosts, and should be unique among re-openings of the same or other DBs. This identifier is recorded in the `LOG` file on the line starting with `DB Session ID:`.

### New Features
* DB identity (`db_id`) and DB session identity (`db_session_id`) are added to table properties and stored in SST files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also mention the special case of SstFileWriter in HISTORY.md, so people can be aware of the db_session_id of files being generated from DB is different from those generated from SstFileWriter.

gg814 added 2 commits June 17, 2020 05:47
Summary:
Added DB::GetDbSessionId by using the same format and machinery as DB::GetDbIdentity.
The DB Session ID is generated (and therefore, updated) each time a DB object is opened. It is written to the LOG file right after the line of “DB SUMMARY”.
A test for the uniqueness, for different openings and during the same opening, is also added.

Test plan:
Passed make check
Summary:
`db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files.

The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.

In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.

A table property test is added.

Test plan:
make check and some manual tests, also ran make check with COMPILE_WITH_ASAN=1 and /tool/check_format_compatible.sh
@gg814 gg814 force-pushed the store_ids_in_sst branch from edd061d to 89a6b81 Compare June 17, 2020 12:51
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gg814
Copy link
Contributor Author

gg814 commented Jun 17, 2020

Added a few more comments.

@pdillinger
Copy link
Contributor

Awesome. The only other thing I would like to see is how much additional space we have added to SST files with this metadata. I'd be satisfied with one simple before-and-after comparison when both a full DB id and session id are stored.

@gg814
Copy link
Contributor Author

gg814 commented Jun 17, 2020

The difference is 99 bytes. The details are as below.

I created two DBs (db_noids and db_ids) with identical contents:
ldb --db=/tmp/db_noids put key val --create_if_missing
ldb --db=/tmp/db_noids put foo bar
ldb --db=/tmp/db_noids put a b
ldb --db=/tmp/db_noids put 1 2
ls -al /tmp/db_noids
total 556
drwxr-xr-x 1 zitan users 354 Jun 17 09:33 .
drwxrwxrwt 1 root root 319474 Jun 17 09:33 ..
-rw-r--r-- 1 zitan users 793 Jun 17 09:32 000004.sst
-rw-r--r-- 1 zitan users 793 Jun 17 09:33 000007.sst
-rw-r--r-- 1 zitan users 787 Jun 17 09:33 000010.sst
-rw-r--r-- 1 zitan users 24 Jun 17 09:33 000012.log
(omitted)
sst_dump --file=/tmp/db_noids/000004.sst --command=scan --show_properties
from [] to []
Process /tmp/db_noids/000004.sst
Sst file format: block-based
'key' seq:1, type:1 => val
Table Properties:
(omitted)
creation time: 1592411576
time stamp of earliest key: 3
file creation time: 0
Raw user collected properties
(omitted)

ls -al /tmp/db_ids
total 556
drwxr-xr-x 1 zitan users 354 Jun 17 09:46 .
drwxrwxrwt 1 root root 320326 Jun 17 09:46 ..
-rw-r--r-- 1 zitan users 892 Jun 17 09:45 000004.sst
-rw-r--r-- 1 zitan users 892 Jun 17 09:45 000007.sst
-rw-r--r-- 1 zitan users 886 Jun 17 09:46 000010.sst
-rw-r--r-- 1 zitan users 24 Jun 17 09:46 000012.log
(omitted)
./sst_dump --file=/tmp/db_ids/000004.sst --command=scan --show_properties
Process /tmp/db_ids/000004.sst
Sst file format: block-based
from [] to []
'key' seq:1, type:1 => val
Table Properties:
(omitted)
creation time: 1592412337
time stamp of earliest key: 0
file creation time: 0
DB identity: e7c5d5f0-d400-4118-b3a8-cf82d813fd1d
DB session identity: f132d07c-d281-4657-a656-3b412a79b280
Raw user collected properties
(omitted)

@zhichao-cao
Copy link
Contributor

You can also add the space overhead conclusion to the PR summary

Copy link
Contributor

@zhichao-cao zhichao-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution!

@pdillinger
Copy link
Contributor

Great, for 64MB default target_file_size_base, that's one-to-two millionths additional space usage. :)

@facebook-github-bot
Copy link
Contributor

@gg814 merged this pull request in 94d0452.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants