Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compressor plugins #6717

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lucagiac81
Copy link
Contributor

@lucagiac81 lucagiac81 commented Apr 16, 2020

This is a follow-up to PR #7650, adding support for external compressor plugins.
The first commit in this PR will match PR #7650 until that is merged.

The Compressor and related classes are made part of the public API to allow development of plugins and to expose compressors in options.

Compressor plugins can be used to easily integrate new compression algorithms into RocksDB. They can also implement compression techniques tailored to specific types of data. For example, if the values in a database are of numeric type (e.g., arrays of integers) with particular distributions (e.g., limited range within each block), the values could be compressed using a lightweight compression algorithm implemented as a plugin (such as frame-of-reference or delta encoding).

Options for Compressors

New options are added to support plugin compressors.
For example, compression was previously configured by compression (of type CompressionType) and compression_opts (of type CompressionOptions). This PR adds a compressor option (pointer to Compressor) to specify a compressor object (which includes type and options).
This approach was followed for the following options:

  • ColumnFamilyOptions: compression (compressor), bottommost_compression (bottommost_compressor)
  • AdvancedColumnFamilyOptions: compression_per_level (compressor_per_level), blob_compression (blob_compressor)
  • CompactionOptions: compression (compressor)

The existing CompressionType/CompressionOptions options are preserved for backward compatibility.
If the user doesn't specify compressors (leaving them null), the CompressionType/CompressionOptions options are used as before. Otherwise, compressors override the existing options.

A new constant kPluginCompression is defined in CompressionType for plugin compressors. The SST properties block stores information about the specific compressor in the compression_name field. This is used to instantiate a suitable compressor when opening the SST.

Option String Examples

Built-in compressor (existing options)
compression=kZSTD;compression_opts={level=1}

Built-in compressor (new options)
compressor={id=ZSTD;level=1}

Plugin compressor (new options)
compressor={id=my_compressor;my_option1=value1;my_option2=value2}

Options Object Example

Built-in compressor (existing options)

Options options;
options.compression = kZSTD;
options.compression_opts.level = 1;

Built-in compressor (new options)

Options options;
ConfigOptions config_options;
  Status s = Compressor::CreateFromString(
      config_options,
      "id=ZSTD;level=1",
      &options.compressor);

Plugin compressor (new options)

Options options;
ConfigOptions config_options;
  Status s = Compressor::CreateFromString(
      config_options,
      "id=my_compressor;my_option1=value1;my_option2=value2",
      &options.compressor);

db_bench

For db_bench, compression_type and individual compression options (such as compression_level) were left unchanged for backward compatibility.
compression_type is still used with plugin compressors to specify their name. Other compressor options can be passed using compressor_options.

Built-in compressor (existing options)
--compression_type=zstd --compression_level=1

Built-in compressor (new options)
--compression_type=zstd --compressor_options="level=1"

Plugin compressor (new options)
--compression_type=my_compressor --compressor_options="my_option1=value1;my_option2=value2"

Limitations/Future Work

Compressor plugins are currently not supported for

  • WAL compression: it requires adding streaming compression to the Compressor interface, as described in PR Add Compressor interface #7650
  • Compressed secondary cache
  • BlobDB: the blob_compressor option allows passing options for built-in compressors, but plugins are not supported. Supporting plugins would require additional metadata to be stored with blob files.

These limitations will be addressed by future PRs.

@adamretter
Copy link
Collaborator

@lucagiac81 This is very interesting. I am wondering if this is technically limited to just "Compression? It looks like I could implement a Compressor with the compress/decompress methods that would actually do encryption/decryption instead. Perhaps there is scope to generalise your approach a little further?

@lucagiac81
Copy link
Contributor Author

@adamretter That's a great point. In general, this should work with any reversible data transformation. For encryption specifically, RocksDB already supports pluggable providers, but there could be other uses.

@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from 25e9471 to 51d169d Compare May 23, 2020 17:10
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like this change but need to give this review some more thought in the details. Some general comments:

  • Can we split out the "custom compression" changes into a new PR? That might simplify things slightly and make it easier to review
  • I was wondering if it would be possible to change some of the logic/object construction slightly:
    -> Rather than pass/store a CompressionType in the Table code, what if it the input argument was a Compressor? That would make fewer places that need to call into NewInstance and would make it easier if NewInstance was not static.
    -> Does it make sense to move more stuff into the Compressor itself -- like the dictionary and the allocator -- rather than passing them in via the context objects? In other words, should those elements be part of the configuration for the Compressor or should they be specified on a per-method call basis?

I will try to go thru the specific code paths in more detail shortly.

include/rocksdb/compressor.h Outdated Show resolved Hide resolved
@@ -132,6 +133,9 @@ struct CompressionOptions {
// Default: 1.
uint32_t parallel_threads;

// Options for custom compressors
std::unordered_map<std::string, std::string> custom_options;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it would be better to use the Options/OptionTypeInfo system to define specific options for the extended types. I can work with you to get that set up.

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
@lucagiac81
Copy link
Contributor Author

@mrambacher Thanks for your feedback! I started by splitting the PR into multiple commits. I'll work on the remaining items.

@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from 615d8b7 to 9c10d24 Compare August 13, 2020 21:25
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from 6e60a47 to 107130b Compare October 3, 2020 01:27
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from 5d4737f to 5fb11cf Compare November 4, 2020 03:13
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 3 times, most recently from 2d126db to 34bd404 Compare November 18, 2020 17:41
@lucagiac81 lucagiac81 changed the title Pluggable Compression for RocksDB Pluggable Compression May 21, 2022
@lucagiac81 lucagiac81 changed the title Pluggable Compression Add support for compressor plugins May 21, 2022
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from 4437cb1 to dde1fc3 Compare May 24, 2022 05:08
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch 2 times, most recently from a608471 to 925f379 Compare June 3, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants