-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce IdSet and add IdSetAggregationFunction #5926
Conversation
private final Type _type; | ||
private final RoaringBitmap _bitmap; | ||
private final Roaring64NavigableMap _longBitmap; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this constructor is the only thing we need? If not, We should make this private and add static builders
/** | ||
* The {@code IdSet} represents a collection of ids. It can be used to optimize the query with huge IN clause. | ||
*/ | ||
public class IdSet implements Comparable<IdSet> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s make this an interface with multiple implementations?
456de34
to
fab464f
Compare
@kishoreg Refactored the PR and introduced the |
Codecov Report
@@ Coverage Diff @@
## master #5926 +/- ##
==========================================
+ Coverage 66.44% 67.35% +0.90%
==========================================
Files 1075 1191 +116
Lines 54773 62438 +7665
Branches 8168 9533 +1365
==========================================
+ Hits 36396 42054 +5658
- Misses 15700 17286 +1586
- Partials 2677 3098 +421
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Description
For issue #5925
Introduce
IdSet
which represents a collection of ids and can be used to replace the large IN clause to optimize the query.4 types of
IdSet
are introduced:EmptyIdSet
: Used as a place holder for empty id setRoaringBitmapIdSet
: Based onRoaringBitmap
and can be used to store INT idsRoaring64NavigableMapIdSet
: Based onRoaring64NavigableMap
and can be used to store LONG idsBloomFilterIdSet
: Based onBloomFilter
and can be used to store any type of ids (contains won't be 100% accurate because of the nature ofBloomFilter
)Add
IdSetAggregationFunction
to createIdSet
for a column.3 configurable parameters for the function:
sizeThresholdInBytes
: Once the size of theIdSet
reaches this threshold, convert it toBloomFilterIdSet
to save spaceexpectedInsertions
: Number of expected insertions for the BloomFilter, must be positivefpp
: Desired false positive probability for the BloomFilter, must be positive and less than 1.0The parameters can be passed to the function as the second argument, e.g.:
SELECT IDSET(intColumn, 'sizeThresholdInBytes=1000;expectedInsertions=1000;fpp=0.03') FROM testTable
Bump up the version of
RoaringBitmap
to0.9.0
to include some bug fixes forRoaring64NavigableMap
.