-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] support Quantile Sketche #54569
base: main
Are you sure you want to change the base?
[Feature] support Quantile Sketche #54569
Conversation
… Sketches. Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
… Sketches. Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
…cks into support_dataSketches
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
fields.add(new StructField("upper_bound", Type.BIGINT)); | ||
return new ArrayType(new StructType(fields, true)); | ||
}; | ||
|
||
public List<Function> getBuiltinFunctions() { | ||
List<Function> builtinFunctions = Lists.newArrayList(); | ||
for (Map.Entry<String, List<Function>> entry : vectorizedFunctions.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Duplicate registration of DS_HLL_COUNT_DISTINCT
functions, which can lead to errors or unexpected behavior during function lookup.
You can modify the code like this:
private void registerBuiltinDsFunction() {
for (Type t : Type.getSupportedTypes()) {
if (t.isFunctionType() || t.isNull() || t.isChar() || t.isPseudoType()) {
continue;
}
// Avoid re-registering ds_hll_count_distinct functions.
if (!vectorizedFunctions.containsKey(DS_HLL_COUNT_DISTINCT)) {
addBuiltin(AggregateFunction.createBuiltin(DS_HLL_COUNT_DISTINCT,
Lists.newArrayList(t), Type.BIGINT, Type.VARBINARY,
true, false, true));
addBuiltin(AggregateFunction.createBuiltin(DS_HLL_COUNT_DISTINCT,
Lists.newArrayList(t, Type.INT), Type.BIGINT, Type.VARBINARY,
true, false, true));
addBuiltin(AggregateFunction.createBuiltin(DS_HLL_COUNT_DISTINCT,
Lists.newArrayList(t, Type.INT, Type.VARCHAR), Type.BIGINT, Type.VARBINARY,
true, false, true));
}
}
// Rest of the function registrations...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function parameters are different and need to be registered multiple times.
} | ||
}; | ||
|
||
} // namespace starrocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Typographical error in type name SketchWarapperType
.
You can modify the code like this:
+ using SketchWrapperType = DataSketchesQuantile<CppType>;
...
+ std::unique_ptr <SketchWrapperType> ds_sketch_wrapper = nullptr;
...
+ ds_sketch_wrapper = std::make_unique<SketchWrapperType>(k, &memory_usage);
...
+ *ranks_prt = other_state.ranks.get()[i];
+ ranks_prt++;
+ }
+ ds_sketch_wrapper =
+ std::make_unique<SketchWrapperType>(other_state.ds_sketch_wrapper->get_k(), &memory_usage);
...
+ ds_sketch_wrapper = std::make_unique<SketchWrapperType>(sketch_data_slice, memory_usage);
...
This modification corrects the misspelled type name from SketchWarapperType
to SketchWrapperType
, ensuring consistency and preventing potential compilation errors.
} | ||
}; | ||
|
||
} // namespace starrocks No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Possible null pointer dereference because ds_sketch_wrapper
may not be initialized before calling update
.
You can modify the code like this:
void update(const Column* data_column, size_t row_num) const {
if (!is_inited()) {
// Handle uninitialized ds_sketch_wrapper scenario, e.g., log an error or throw an exception.
return;
}
uint64_t value = 0;
const ColumnType* column = down_cast<const ColumnType*>(data_column);
if constexpr (lt_is_string<LT>) {
Slice s = column->get_slice(row_num);
value = HashUtil::murmur_hash64A(s.data, s.size, HashUtil::MURMUR_SEED);
} else {
const auto& v = column->get_data();
value = HashUtil::murmur_hash64A(&v[row_num], sizeof(v[row_num]), HashUtil::MURMUR_SEED);
}
ds_sketch_wrapper->update(value);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to check whether it has been initialized, because the method calling it has already done the corresponding processing.
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Signed-off-by: chenminghua8 <cmptmn@126.com>
Quality Gate passedIssues Measures |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 91 / 95 (95.79%) file detail
|
[BE Incremental Coverage Report]✅ pass : 266 / 297 (89.56%) file detail
|
Why I'm doing:
What I'm doing:
[Feature] support Quantile Theta Sketche
Implement the aggregation function that supports DataSketches Quantile:
Fixes #50671
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: