Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a blob file reader class #7461

Closed
wants to merge 60 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Sep 29, 2020

Summary:
The patch adds a class called BlobFileReader that can be used to retrieve blobs
using the information available in blob references (e.g. blob file number, offset, and
size). This will come in handy when implementing blob support for Get, MultiGet,
and iterators, and also for compaction/garbage collection.

When a BlobFileReader object is created (using the factory method Create),
it first checks whether the specified file is potentially valid by comparing the file
size against the combined size of the blob file header and footer (files smaller than
the threshold are considered malformed). Then, it opens the file, and reads and verifies
the header and footer. The verification involves magic number/CRC checks
as well as checking for unexpected header/footer fields, e.g. incorrect column family ID
or TTL blob files.

Blobs can be retrieved using GetBlob. GetBlob validates the offset and compression
type passed by the caller (because of the presence of the header and footer, the
specified offset cannot be too close to the start/end of the file; also, the compression type
has to match the one in the blob file header), and retrieves and potentially verifies and
uncompresses the blob. In particular, when ReadOptions::verify_checksums is set,
BlobFileReader reads the blob record header as well (as opposed to just the blob itself)
and verifies the key/value size, the key itself, as well as the CRC of the blob record header
and the key/value pair.

In addition, the patch exposes the compression type from BlobIndex (both using an
accessor and via DebugString), and adds a blob file read latency histogram to
InternalStats that can be used with BlobFileReader.

Test Plan:
make check

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ltamasi for the PR, and I left a few minor comments.

db/blob/blob_file_reader.h Outdated Show resolved Hide resolved
db/blob/blob_file_reader.cc Show resolved Hide resolved
db/blob/blob_file_reader.cc Show resolved Hide resolved
db/blob/blob_file_reader.cc Show resolved Hide resolved
db/blob/blob_file_reader.cc Outdated Show resolved Hide resolved
db/blob/blob_file_reader.h Show resolved Hide resolved
db/blob/blob_file_reader_test.cc Outdated Show resolved Hide resolved
db/blob/blob_file_reader_test.cc Show resolved Hide resolved
@ltamasi
Copy link
Contributor Author

ltamasi commented Oct 6, 2020

Thanks for the review @riversand963 !

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the unresolved minors.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ltamasi merged this pull request in 22655a3.

@mrambacher
Copy link
Contributor

This change fails to build on MacOS:

db/blob/blob_file_reader_test.cc:332:29: error: constexpr variable 'expiration_range_header' must be initialized by a
constant expression
constexpr ExpirationRange expiration_range_header(1, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
db/blob/blob_file_reader_test.cc:332:29: note: non-constexpr constructor 'pair<int, int, false>' cannot be used in a
constant expression
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/utility:447:5: note: declared here
pair(_U1&& __u1, _U2&& __u2)
^
db/blob/blob_file_reader_test.cc:370:29: error: constexpr variable 'expiration_range_footer' must be initialized by a
constant expression
constexpr ExpirationRange expiration_range_footer(1, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
db/blob/blob_file_reader_test.cc:370:29: note: non-constexpr constructor 'pair<int, int, false>' cannot be used in a
constant expression
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/utility:447:5: note: declared here
pair(_U1&& __u1, _U2&& __u2)

@ltamasi
Copy link
Contributor Author

ltamasi commented Oct 8, 2020

Thanks for letting me know. #7519 should fix this.

@mrambacher
Copy link
Contributor

This appears to have introduced a regression in the stats: #7664

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Nov 12, 2020
Summary:
facebook#7461 accidentally broke
`InternalStats::DumpCFStats` by making `DumpCFFileHistogram` overwrite
the output of `DumpCFStatsNoFileHistogram` instead of appending to it,
resulting in only the file histogram related information getting logged.
The patch fixes this by reverting to appending in `DumpCFFileHistogram`.

Fixes facebook#7664 .

Test Plan:
Ran `make check` and checked the info log of `db_bench`.
facebook-github-bot pushed a commit that referenced this pull request Nov 13, 2020
Summary:
#7461 accidentally broke
`InternalStats::DumpCFStats` by making `DumpCFFileHistogram` overwrite
the output of `DumpCFStatsNoFileHistogram` instead of appending to it,
resulting in only the file histogram related information getting logged.
The patch fixes this by reverting to appending in `DumpCFFileHistogram`.

Fixes #7664 .

Pull Request resolved: #7666

Test Plan: Ran `make check` and checked the info log of `db_bench`.

Reviewed By: riversand963

Differential Revision: D24929051

Pulled By: ltamasi

fbshipit-source-id: 636a3d5ebb5ce23de4f3fe4f03ad3f16cb2858f8
codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
The patch adds a class called `BlobFileReader` that can be used to retrieve blobs
using the information available in blob references (e.g. blob file number, offset, and
size). This will come in handy when implementing blob support for `Get`, `MultiGet`,
and iterators, and also for compaction/garbage collection.

When a `BlobFileReader` object is created (using the factory method `Create`),
it first checks whether the specified file is potentially valid by comparing the file
size against the combined size of the blob file header and footer (files smaller than
the threshold are considered malformed). Then, it opens the file, and reads and verifies
the header and footer. The verification involves magic number/CRC checks
as well as checking for unexpected header/footer fields, e.g. incorrect column family ID
or TTL blob files.

Blobs can be retrieved using `GetBlob`. `GetBlob` validates the offset and compression
type passed by the caller (because of the presence of the header and footer, the
specified offset cannot be too close to the start/end of the file; also, the compression type
has to match the one in the blob file header), and retrieves and potentially verifies and
uncompresses the blob. In particular, when `ReadOptions::verify_checksums` is set,
`BlobFileReader` reads the blob record header as well (as opposed to just the blob itself)
and verifies the key/value size, the key itself, as well as the CRC of the blob record header
and the key/value pair.

In addition, the patch exposes the compression type from `BlobIndex` (both using an
accessor and via `DebugString`), and adds a blob file read latency histogram to
`InternalStats` that can be used with `BlobFileReader`.

Pull Request resolved: facebook#7461

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D23999219

Pulled By: ltamasi

fbshipit-source-id: deb6b1160d251258b308d5156e2ec063c3e12e5e
codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
facebook#7461 accidentally broke
`InternalStats::DumpCFStats` by making `DumpCFFileHistogram` overwrite
the output of `DumpCFStatsNoFileHistogram` instead of appending to it,
resulting in only the file histogram related information getting logged.
The patch fixes this by reverting to appending in `DumpCFFileHistogram`.

Fixes facebook#7664 .

Pull Request resolved: facebook#7666

Test Plan: Ran `make check` and checked the info log of `db_bench`.

Reviewed By: riversand963

Differential Revision: D24929051

Pulled By: ltamasi

fbshipit-source-id: 636a3d5ebb5ce23de4f3fe4f03ad3f16cb2858f8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants