Skip to content

feat: chunked log segment manager #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 85 commits into from

Conversation

jeqo
Copy link
Contributor

@jeqo jeqo commented May 11, 2023

Following up #217

Expands ChunkManager into ChunkedLogSegmentManager and absorbs transform/FetchChunkTransform to consolidate interactions between TransformPipeline (see #217) and Storage backend to implement URSM requests.

Dependencies would flow:

  • USRM
    • ChunkedLogSegmentManager
      • Transform pipeline
        • Encryption
        • Compression
      • Storage backend
        • FileSystem
        • S3

ivanyu and others added 30 commits March 31, 2023 14:26
Switch to published version of Kafka deps
This commit introduces chunk transformations, that are the foundation of the chunking itself (both for upload and download) and also of the optional encryption and compression.
feat: add custom S3 endpoint URL config
This commit adds JSON (Jackson-based) (de-)serialization to the chunk index classes and everything that is needed for this. Most notably, it adds a compact binary codec for chunks lists present in variable size indices.
They are added for tests mostly. In any case, these objects are not supposed to be compared on a hot path, so an easy but producing more garbage implementation was selected.
These interfaces are mainly supposed to be implemented with S3, GCS, and other object storage implementations, which are to be done. The file system implementation is mostly for testing.

To put this in the future context, the plugin will instantiate concrete implementations of `ObjectStorageFactory`, which will be S3, GCS, and others.
Add object storage interfaces and file system implementation
This includes also a secret encryption/decryption on serialization/deserialization.
Add segment manifest and its (de-)serialization
changes: rename tieredstorage, move chunk to root, rename index module, metadata moved to security
refactor: reorg commons module
Streamline encryption key and AAD generation
Add config for UniversalRemoteStorageManager
To clarify how Config will pass from framework to plugin
AnatolyPopov and others added 27 commits May 4, 2023 11:32
fix: remove parent directories to mimic s3 behaviour
…ed-fields

Require fields in JSON deserialization of chunk indices
fix: chunk manager: return input stream instead of future
This doesn't make sense and also cause division by 0 in `FixedSizeChunkIndex.chunkCount`
…lChunkSize-positive

Don't allow originalChunkSize to be 0
feat: adding ChunkManager implementation
Simplifying chunk fetch to avoid unnecessary off-by-one error.
feat: add object range for fetcher
refactor: remove overwrite flag for FileSystem storage
UniversalRemoteStorageManager implementation
@jeqo jeqo force-pushed the new-implementation branch from 061eda8 to fae0128 Compare May 17, 2023 15:44
@jeqo
Copy link
Contributor Author

jeqo commented May 17, 2023

Will reopen if #217 is merged.

@jeqo jeqo closed this May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants