[Feature] Support Coalesce for Column Names#9327
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9327 +/- ##
============================================
+ Coverage 63.40% 66.97% +3.56%
- Complexity 4762 4894 +132
============================================
Files 1832 1403 -429
Lines 98146 73108 -25038
Branches 15020 11722 -3298
============================================
- Hits 62231 48964 -13267
+ Misses 31321 20582 -10739
+ Partials 4594 3562 -1032
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I think implicitly treating everything as STRING will give confusing semantics imo. If we don't want to support this function on few data types, then throwing exception is better.
Reference - https://docs.snowflake.com/en/sql-reference/functions/coalesce.html
There was a problem hiding this comment.
+1. For the first version, we can check if all the input are numbers or strings
There was a problem hiding this comment.
I added a check for argument type. but the return value has to be a string, otherwise we are not able to represent null?
There was a problem hiding this comment.
Good point. Currently we don't support null values for transform function yet. You may read TransformBlockValSet.getNullBitmap(), which count the result as null when any argument is null. This won't work for coalesce.
For now, we may use NullValueUtils.getDefaultNullValue() for each data type to present the null. Then we should figure out a way to support the real null in transform function.
There was a problem hiding this comment.
(edited) represented using default nullvalue probably is not a good idea since all numeric data type uses zero as default null.
There was a problem hiding this comment.
@walterddr Not really. We use min value as the default for numeric types except for big decimal, which doesn't have a min value
There was a problem hiding this comment.
Updated to use NullValueUtils.getDefaultNullValue() and only supports numeric and string types. Also it requires the type to be same for all arguments.
adbc410 to
21d21ae
Compare
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I don't think we need this check. We may use getResultMetadata() to rule out the unsupported types
There was a problem hiding this comment.
Removed the second check but we need the type to be identifier to get the null bit map?
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/operator/transform/function/CoalesceTransformFunction.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
LGTM with minor comments
There was a problem hiding this comment.
(minor) Let's change the name to storedType to be more explicit
There was a problem hiding this comment.
(minor) We usually follow the sequence of INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING (same order as the enum for easier tracking
There was a problem hiding this comment.
(also use intellij autocomplete to generate the branches will automatically be in that order :-) )
There was a problem hiding this comment.
(minor) Let's calculate the result metadata in the init() and store it in a member variable. It has 2 benefits:
- Fail fast when the type cannot be supported
- Prevent calculating result metadata multiple times
There was a problem hiding this comment.
(minor) Follow the same sequence (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING) as the interface for easier tracking
There was a problem hiding this comment.
(minor) Put this method before getDoublelTransformResults() for easier tracking
Coalesce takes a list of arguments and returns the first not null value. If all arguments are null, return a null.
This implementations transform all arguments into string and return the first non-null string value. Null is represented as string value "null"