-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid lots of calls to Env::GetFileSize() in SstFileManagerImpl when opening DB #6363
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al13n321 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
db/db_impl/db_impl_open.cc
Outdated
} | ||
} | ||
} | ||
|
||
for (const auto& kv : known_file_sizes) { | ||
ROCKS_LOG_WARN(impl->immutable_db_options_.info_log, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a misleading message. Since the calls to sst_file_manager are happening outside the DB mutex, a compaction happening in parallel could have deleted files after we retrieved the list of live files from the current version. In any case, missing files would have been caught by the consistency check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a sanity check that this code handles file names correctly. E.g. if I missed that I need to strip the leading '/' from md.name
a few lines above, that could go unnoticed for a long time without this warning.
file/sst_file_manager_impl.h
Outdated
// Parameter `file_size` is for optimization: if the caller knows the size of | ||
// the file, they can provide it, otherwise OnAddFile() will query it from | ||
// the filesystem. | ||
Status OnAddFile(const std::string& file_path, bool compaction = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to just overload OnAddFile() rather than add another default parameter.
@al13n321 has updated the pull request. Re-import the pull request |
Removed the log message and added the overload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al13n321 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for contributing this optimization!
Summary: The most expensive operations during DB opening in GHOST mode is checking the size of all SST files for bookkeeping in SSTFileManager. This patch adds an option to use a constant size for SSTFileManager instead. Changes in `SSTFileManagerImpl` are already made upstream in facebook#6363, so it shouldn't be too bad when we merge. Test Plan: Add a unit test. Reviewers: dhruba, igor Differential Revision: https://rockset.phacility.com/D5613
Summary: The most expensive operations during DB opening in GHOST mode is checking the size of all SST files for bookkeeping in SSTFileManager. In our cases, with a DB of > 5000 files, there would be 5000 `GetFileSize` calls to S3. This is simply too expensive. In some cases, with 5000 SST files, it would take up to 50 seconds just to open a DB to compact just 2 SST files! This patch adds an option to use a constant size for SSTFileManager instead. Changes in `SSTFileManagerImpl` are already made upstream in facebook#6363, so it shouldn't be too bad when we merge. Test Plan: Add a unit test. Reviewers: dhruba, igor Reviewed By: igor Differential Revision: https://rockset.phacility.com/D5613
…opening DB (#6363) Summary: Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant. Pull Request resolved: facebook/rocksdb#6363 Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s. Differential Revision: D19702509 Pulled By: al13n321 fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73 Signed-off-by: Changlong Chen <levisonchen@live.cn>
…opening DB (#6363) Summary: Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant. Pull Request resolved: facebook/rocksdb#6363 Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s. Differential Revision: D19702509 Pulled By: al13n321 fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant.
Test plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s.