This project is a collection of RemoteStorageManagers
for Apache Kafka tiered storage.
The project follows the API specifications according to the latest version of KIP-405: Kafka Tiered Storage.
Currently, support for S3
storage is built.
We intend to add support for GCS
storage in the future, along with other cloud provider storage solutions.
Conceptually, RemoteStorageManagers
is very simple and exposes a CRUD interface for uploading, (ranged) fetching, and deleting Kafka log segment files. This implementation was done with few additional requirements:
- Compression. It must provide optional (configurable) compression. The compression must be conditional and double compression of Kafka-compressed log segments should be avoided.
- Encryption. It must provide optional (configurable) encryption with key rotation support.
- Small download overhead. It must avoid wasted download from remote storage.
Compression and encryption make it difficult to do ranged queries. To read an encrypted file from the middle, one needs to decompress / decrypt--and so download--the whole prefix. As files can be large, this may lead to huge download overhead.
One way to combat this is chunking. With chunking, the original file is split into chunks, which are transformed (compressed, encrypted) independently.
Original file: Transformed file:
+-----------+ -------\
| | \
| Chunk 1 | \----- +-----------+
| | | Chunk 1 |
+-----------+ --------------- +-----------+
| | | Chunk 2 |
| Chunk 2 | /------- +-----------+
| | / | Chunk 3 |
+-----------+ -----/ /--- +-----------+
| Chunk 3 | /
+-----------+ ---------/
Now, knowing the sizes of original and transformed chunks, it's possible to map from a position in the original file Po
to a position in the transformed file Pt
within the accuracy of a chunk size. Thus, some overhead remains, but it is reduced as much as chunks are smaller than the whole file.
Splitting a file into multiple chunks may significantly increase the overhead of remote storage operations--most notably writes and lists--which are usually billed for. To avoid this, transformed chunks could be concatenated back into a single blob and uploaded as one.
To maintain the ability to map offsets/chunks between original and transformed files, an index is needed.
The index must be stored on the remote storage along with other files related to a log segment to keep these uploads self-sufficient.
It's important to keep the size of these indices small. One technique is to keep the size of original chunks fixed and explicitly list only the transformed side of chunks. Furthermore, if compression is not used, the size of transformed chunks also remains fixed.
Binary encoding could reduce the index size further. Values are encoded as a difference from a common base and the encoding uses the minimum number of bytes per value. See the javadoc for ChunkSizesBinaryCodec
for details. We could expect 1-2 bytes per value on average.
A 2 GB file chunked into 1 MB chunks produce about 2000 index entries, which is 2-4 KB.
The index could be compressed with Zstd after that and have to be Base64-encoded (about 30% overhead). This results in 2.7-3.7 KB.
This encoding scheme allows indices to be stored (cached) in memory and accessed with little overhead.
Rejected alternative: An alternative approach may be to just write values as integers (4 byte per value) and rely on Zstd to compress it significantly. Despite this produces similar results size-wise (3.6-4.2 KB, i.e. a little worse), it's difficult to store indices in memory in the encoded form, because this would require decompression on each access.
TBD
TBD
TBD
TBD
TBD
TBD
TBD
The project is licensed under the Apache license, version 2.0. Full license test is available in the LICENSE file.
For a detailed guide on how to contribute to the project, please check CONTRIBUTING.md.
This project uses the Contributor Covenant code of conduct. Please check CODE_OF_CONDUCT.md for further details.
Apache Kafka is either a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. All product and service names used in this page are for identification purposes only and do not imply endorsement.
This project is maintained by, Aiven open source developers.
Recent contributors are listed on the GitHub project page, https://github.com/aiven/tiered-storage-for-apache-kafka/graphs/contributors.
Copyright (c) 2022 Aiven Oy and project contributors.