Enhance Support for Larger Datasets and Buckets in Encoding#11
Open
EladGabay wants to merge 1 commit intolinvon:mainfrom
Open
Enhance Support for Larger Datasets and Buckets in Encoding#11EladGabay wants to merge 1 commit intolinvon:mainfrom
EladGabay wants to merge 1 commit intolinvon:mainfrom
Conversation
This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in `generateIndexTagHash`, `altIndex` and `indexHash`. Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced. Decoding seamlessly supports both formats. The encode method takes a legacy boolean parameter for gradual adoption.
Contributor
Author
|
@linvon would you like to take a look? 😊 |
Owner
Sorry, busy with work, but I will find some time to handle this |
Contributor
Author
|
Hi, @linvon , let me know if you need any help :) |
Contributor
Author
|
@linvon gentle ping |
Contributor
Author
|
Hi @linvon do you think it's going to be merged soon? 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in
generateIndexTagHash,altIndexandindexHash.Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced.
Decoding seamlessly supports both formats.
The encode method takes a legacy boolean parameter for gradual adoption.