[feat](skew & kurt) New aggregate function skew & kurt#40945
[feat](skew & kurt) New aggregate function skew & kurt#40945HappenLee merged 8 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 41559 ms |
TPC-DS: Total hot run time: 193943 ms |
ClickBench: Total hot run time: 31.75 s |
|
TeamCity be ut coverage result: |
4ae074e to
c5fa11c
Compare
|
run buildall |
TPC-H: Total hot run time: 41954 ms |
TPC-DS: Total hot run time: 194990 ms |
ClickBench: Total hot run time: 32.71 s |
|
TeamCity be ut coverage result: |
|
|
||
| namespace doris::vectorized { | ||
|
|
||
| enum class StatisticsFunctionKind : uint8_t { skewPop, kurtPop }; |
|
|
||
| namespace doris::vectorized { | ||
|
|
||
| enum class StatisticsFunctionKind : uint8_t { skewPop, kurtPop }; |
There was a problem hiding this comment.
renamed to STATISTICS_FUNCTION_KIND
|
|
||
| template <typename T, std::size_t _level> | ||
| struct StatFuncOneArg { | ||
| using Type1 = T; |
There was a problem hiding this comment.
same type, no need two type
| } | ||
| } | ||
|
|
||
| void reset() { return; } |
There was a problem hiding this comment.
this function is usefully, should reset all m to init val
| using ResultType = Float64; | ||
| using Data = VarMoments<ResultType, _level>; | ||
|
|
||
| static constexpr UInt32 num_args = 1; |
There was a problem hiding this comment.
seems not use this var?
| using ColVecT1 = ColumnVectorOrDecimal<T1>; | ||
| using ColVecT2 = ColumnVectorOrDecimal<T2>; | ||
| using ResultType = typename StatFunc::ResultType; | ||
| using ColVecResult = ColumnVector<ResultType>; |
There was a problem hiding this comment.
here seems could write more simple code,
as the two function return type is ColumnFloat64
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Skew.java
Show resolved
Hide resolved
| implements UnaryExpression, ExplicitlyCastableSignature, AlwaysNullable { | ||
|
|
||
| public static final List<FunctionSignature> SIGNATURES = ImmutableList.of( | ||
| FunctionSignature.ret(DoubleType.INSTANCE).args(FloatType.INSTANCE), |
There was a problem hiding this comment.
could let FE members check the args order, #39352
There was a problem hiding this comment.
now same with 39352
|
|
||
| void add(AggregateDataPtr __restrict place, const IColumn** columns, ssize_t row_num, | ||
| Arena*) const override { | ||
| if constexpr (NullableInput) { |
There was a problem hiding this comment.
should skip the null value
There was a problem hiding this comment.
this function is using creator_without_type::create_ignore_nullable, aggregate_function_null will not be used since this return type is always nullable.
| "'getPopulation' method"); | ||
| } | ||
|
|
||
| T getPopulation() const { |
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41903 ms |
TPC-DS: Total hot run time: 195972 ms |
42228ba
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41536 ms |
TPC-DS: Total hot run time: 191403 ms |
ClickBench: Total hot run time: 33.05 s |
|
run p0 |
|
PR approved by at least one committer and no changes requested. |
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
# Versions - [x] dev - [x] 3.0 - [ ] 2.1 - [ ] 2.0 # Languages - [x] Chinese - [x] English ref apache/doris#40945
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
skew,skew_popandskewnessis used to calculate skewness of a data distribution.kurt,kurt_popandkurtosisis used to calculate kurtosis of a data distribution.The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN.
The formula used to calculate skew is
3-th moments / (variance^{1.5})The formula used to calculate kurt is
4-th moments / (variance^{2}) - 3when value of any result is NaN, doris will return NULL.
doc: apache/doris-website#1127