-
Notifications
You must be signed in to change notification settings - Fork 0
2595: fix: re-enable Comet abs
#7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds comprehensive support for the Abs mathematical expression in Comet. Changes include documentation updates, Rust implementation with ANSI overflow handling across multiple data types, Scala serialization support, protobuf definition removal, and expanded test coverage for regular and overflow scenarios. Changes
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 1 suggestions posted.
| } | ||
| } | ||
| }) | ||
| .unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a.map(...).unwrap() will panic for typed NULL scalars like ScalarValue::Int8(None). Consider handling None explicitly and returning the corresponding NULL scalar; this also applies to the Int16/Int32/Int64/Decimal128/Decimal256 branches.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
docs/source/user-guide/latest/configs.md(1 hunks)docs/source/user-guide/latest/expressions.md(1 hunks)native/core/src/execution/planner.rs(0 hunks)native/proto/src/proto/expr.proto(0 hunks)native/spark-expr/src/comet_scalar_funcs.rs(2 hunks)native/spark-expr/src/math_funcs/abs.rs(1 hunks)native/spark-expr/src/math_funcs/mod.rs(1 hunks)spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala(1 hunks)spark/src/main/scala/org/apache/comet/serde/math.scala(2 hunks)spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala(1 hunks)spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala(2 hunks)spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala(1 hunks)
💤 Files with no reviewable changes (2)
- native/proto/src/proto/expr.proto
- native/core/src/execution/planner.rs
🧰 Additional context used
🧬 Code graph analysis (6)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
spark/src/main/scala/org/apache/comet/serde/math.scala (2)
CometUnhex(128-145)CometAbs(147-164)
spark/src/main/scala/org/apache/comet/serde/math.scala (1)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (5)
convert(659-697)convert(1883-1887)exprToProtoInternal(654-834)scalarFunctionExprToProtoWithReturnType(907-919)optExprWithInfo(1602-1613)
native/spark-expr/src/math_funcs/abs.rs (1)
native/spark-expr/src/lib.rs (1)
arithmetic_overflow_error(110-114)
native/spark-expr/src/comet_scalar_funcs.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
abs(67-304)
native/spark-expr/src/math_funcs/mod.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
abs(67-304)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (8)
makeParquetFileAllPrimitiveTypes(459-461)makeParquetFileAllPrimitiveTypes(526-615)withParquetTable(390-398)withParquetTable(400-403)withParquetTable(405-408)checkSparkAnswerAndOperator(169-171)checkSparkAnswerAndOperator(173-177)checkSparkAnswerAndOperator(179-186)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (10)
spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala (1)
137-137: LGTM!The change from
longBitPositiontointBitPositioncorrectly aligns with the data type of column_11, which isINT_32(32-bit integer) in the Parquet schema.docs/source/user-guide/latest/expressions.md (1)
121-121: LGTM!Documentation correctly reflects the new Abs expression support with Spark compatibility.
docs/source/user-guide/latest/configs.md (1)
153-153: LGTM!Configuration entry follows the established pattern for expression toggles and correctly defaults to
true.spark/src/main/scala/org/apache/comet/serde/math.scala (2)
22-22: LGTM!Import addition is necessary for the CometAbs implementation below.
147-164: LGTM!The
CometAbsimplementation correctly follows the established pattern for math expressions withfailOnErrorsupport, similar toCometUnhex. The serialization logic properly handles the child expression and return type.native/spark-expr/src/comet_scalar_funcs.rs (2)
19-19: LGTM!Import statement correctly brings in the
absfunction from the newly added math_funcs module.
184-187: LGTM!The
absfunction registration follows the established pattern for scalar UDFs without explicit data type specification, consistent with similar functions likespark_modulo.spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
140-141: LGTM!The mapping correctly wires
Absexpression to theCometAbsserialization handler, following the established pattern for math expressions.native/spark-expr/src/math_funcs/mod.rs (1)
18-18: LGTM!Module declaration correctly exposes the
absmodule with appropriate crate-level visibility, consistent with other math function modules.spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (1)
507-510: The original review comment is incorrect and should be disregarded.The schema changes are intentional and correct. The context shows this is in the
getPrimitiveTypesParquetSchemafunction (line 463) which has two conditional branches:
- TRUE branch (incompatible types enabled): Uses
UINT_32for fields _9-_11- FALSE branch (incompatible types disabled): Uses
INT_8,INT_16,INT_32,INT_64for fields _9-_12The INT schema accurately reflects the actual data types generated in
makeParquetFileAllPrimitiveTypes: the code generates(-i).toByte,(-i).toShort,-i, and(-i).toLong(lines 574-577), which are 8-bit, 16-bit, 32-bit, and 64-bit signed values respectively. The schema correctly maps these toINT_8,INT_16,INT_32, andINT_64.The comments at lines 464-468 document that this distinction addresses incompatible type behavior differences in Parquet spec handling across implementations. The "width reduction" is not a truncation problem but rather proper bitwidth assignment matching the actual data types.
Likely an incorrect or invalid review comment.
| ScalarValue::Int8(a) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(abs_val)))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(v)))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Int8").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| ScalarValue::Int16(a) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(abs_val)))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(v)))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Int16").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| ScalarValue::Int32(a) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v)))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Int32").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| ScalarValue::Int64(a) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(abs_val)))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(v)))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Int64").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| ScalarValue::Float32(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float32( | ||
| a.map(|x| x.abs()), | ||
| ))), | ||
| ScalarValue::Float64(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float64( | ||
| a.map(|x| x.abs()), | ||
| ))), | ||
| ScalarValue::Decimal128(a, precision, scale) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal128( | ||
| Some(abs_val), | ||
| *precision, | ||
| *scale, | ||
| ))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Decimal128( | ||
| Some(v), | ||
| *precision, | ||
| *scale, | ||
| ))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Decimal128").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| ScalarValue::Decimal256(a, precision, scale) => a | ||
| .map(|v| match v.checked_abs() { | ||
| Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal256( | ||
| Some(abs_val), | ||
| *precision, | ||
| *scale, | ||
| ))), | ||
| None => { | ||
| if !fail_on_error { | ||
| // return the original value | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Decimal256( | ||
| Some(v), | ||
| *precision, | ||
| *scale, | ||
| ))) | ||
| } else { | ||
| Err(arithmetic_overflow_error("Decimal256").into()) | ||
| } | ||
| } | ||
| }) | ||
| .unwrap(), | ||
| dt => exec_err!("Not supported datatype for ABS: {dt}"), | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent NULL scalar panic in abs
ScalarValue::Int* and decimal branches call .map(...).unwrap(). When the scalar is NULL (e.g., ScalarValue::Int32(None) from ABS(CAST(NULL AS INT))), map returns None and the unwrap() panics, crashing the query instead of propagating NULL. Please handle the None case explicitly and return a NULL ScalarValue, keeping the overflow logic for concrete values.
- ScalarValue::Int32(a) => a
- .map(|v| match v.checked_abs() {
- Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
- None => {
- if !fail_on_error {
- Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v))))
- } else {
- Err(arithmetic_overflow_error("Int32").into())
- }
- }
- })
- .unwrap(),
+ ScalarValue::Int32(a) => match a {
+ Some(v) => match v.checked_abs() {
+ Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
+ None if !fail_on_error => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v)))),
+ None => Err(arithmetic_overflow_error("Int32").into()),
+ },
+ None => Ok(ColumnarValue::Scalar(ScalarValue::Int32(None))),
+ },Apply the same pattern to the Int8/Int16/Int64 and Decimal128/Decimal256 branches.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In native/spark-expr/src/math_funcs/abs.rs around lines 201 to 302, the
Int8/Int16/Int32/Int64 and Decimal128/Decimal256 scalar branches call
.map(...).unwrap(), which panics when the ScalarValue is NULL; change each
branch to explicitly handle the None case by returning
Ok(ColumnarValue::Scalar(ScalarValue::<Type>(None))) for NULL inputs, and only
run the checked_abs / overflow logic for Some(v) (preserving the existing
fail_on_error behavior and returning the original Some(v) when not failing).
Ensure Decimal branches include the original precision and scale when returning
None or the original value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment, thank you for confirming the issue and clarifying the expected behavior! I've noted that NULL values for signed integers and decimals should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking. This will help ensure correct NULL handling in future reviews.
🐰✨
✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.731Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
1ddff98 to
400d60f
Compare
|
bugbot review |
value:annoying; category:bug; feedback:The reviewed file is not part of the changed files in this PR. I've rebased the PR and re-requested a new review by Bugbot |
2595: To review by AI
Note
Re-enables Comet
absvia a native UDF, adds serde/wiring, updates docs, and introduces comprehensive tests including ANSI overflow; also tweaks related tests.absUDF in Rust (math_funcs/abs.rs) with ANSI overflow handling and decimals support; register increate_comet_physical_fun.CometAbsand mapAbsinQueryPlanSerdeto scalar funcabswithfailOnErrorarg.Absmessage/field; rely on generic scalar function path.Absto supported Math expressions and enablement table inconfigs.mdandexpressions.md.abstests, including ANSI overflow and legacy modes.bit_getindex usage; adjust test parquet schema types for numeric columns.Written by Cursor Bugbot for commit 400d60f. This will update automatically on new commits. Configure here.