2595: fix: re-enable Comet `abs` #7

martin-augment · 2025-11-04T14:04:16Z

2595: To review by AI

Note

Re-enables Comet abs via a native UDF, adds serde/wiring, updates docs, and introduces comprehensive tests including ANSI overflow; also tweaks related tests.

Expressions / Engine:
- Implement native abs UDF in Rust (math_funcs/abs.rs) with ANSI overflow handling and decimals support; register in create_comet_physical_fun.
- Add Scala serde CometAbs and map Abs in QueryPlanSerde to scalar func abs with failOnError arg.
- Remove obsolete protobuf Abs message/field; rely on generic scalar function path.
Docs:
- Add Abs to supported Math expressions and enablement table in configs.md and expressions.md.
Tests:
- Enable and expand abs tests, including ANSI overflow and legacy modes.
- Minor test adjustments: fix bit_get index usage; adjust test parquet schema types for numeric columns.

^{Written by Cursor Bugbot for commit 400d60f. This will update automatically on new commits. Configure here.}

coderabbitai · 2025-11-04T14:04:34Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds comprehensive support for the Abs mathematical expression in Comet. Changes include documentation updates, Rust implementation with ANSI overflow handling across multiple data types, Scala serialization support, protobuf definition removal, and expanded test coverage for regular and overflow scenarios.

Changes

Cohort / File(s)	Summary
Documentation Updates `docs/source/user-guide/latest/configs.md`, `docs/source/user-guide/latest/expressions.md`	Added spark.comet.expression.Abs.enabled configuration option with default true. Added Abs to Math Expressions table marked as Spark-Compatible.
Protobuf Definition Removal `native/proto/src/proto/expr.proto`	Removed Abs message type and removed abs = 49 field from Expr oneof expr_struct.
Native Rust Core `native/core/src/execution/planner.rs`	Removed dedicated Abs expression handling from Spark-to-DataFusion expression mapping; Abs now falls back to generic Not implemented path.
Rust Abs Implementation and Integration `native/spark-expr/src/math_funcs/abs.rs`	Added comprehensive Abs function implementation with support for multiple data types, ANSI overflow handling, legacy semantics, and extensive test coverage.
Rust Module Declarations `native/spark-expr/src/math_funcs/mod.rs`, `native/spark-expr/src/comet_scalar_funcs.rs`	Declared new pub(crate) abs module. Added abs function to scalar function registry via make_comet_scalar_udf.
Scala Serialization Support `spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala`, `spark/src/main/scala/org/apache/comet/serde/math.scala`	Extended mathExpressions map to include Abs mapping. Imported Abs and implemented CometAbs serializer transforming child expression and failOnError to proto.
Test Updates `spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala`, `spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala`, `spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala`	Converted ignored abs tests to active tests with ANSI and dictionary configuration coverage. Added overflow behavior tests for ANSI and legacy modes. Changed bit_get position reference and updated Parquet schema to use signed integer types (INT_8, INT_16, INT_32, INT_64) instead of unsigned variants.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-2595-2025-11-04-14-04-12

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode

Review completed. 1 suggestions posted.

augmentcode · 2025-11-04T14:06:40Z

native/spark-expr/src/math_funcs/abs.rs

+                        }
+                    }
+                })
+                .unwrap(),


Using a.map(...).unwrap() will panic for typed NULL scalars like ScalarValue::Int8(None). Consider handling None explicitly and returning the corresponding NULL scalar; this also applies to the Int16/Int32/Int64/Decimal128/Decimal256 branches.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5b13c and 1ddff98.

📒 Files selected for processing (12)

docs/source/user-guide/latest/configs.md (1 hunks)
docs/source/user-guide/latest/expressions.md (1 hunks)
native/core/src/execution/planner.rs (0 hunks)
native/proto/src/proto/expr.proto (0 hunks)
native/spark-expr/src/comet_scalar_funcs.rs (2 hunks)
native/spark-expr/src/math_funcs/abs.rs (1 hunks)
native/spark-expr/src/math_funcs/mod.rs (1 hunks)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1 hunks)
spark/src/main/scala/org/apache/comet/serde/math.scala (2 hunks)
spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala (1 hunks)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (2 hunks)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (1 hunks)

💤 Files with no reviewable changes (2)

native/proto/src/proto/expr.proto
native/core/src/execution/planner.rs

🧰 Additional context used

🧬 Code graph analysis (6)

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

CometUnhex (128-145)

CometAbs (147-164)

spark/src/main/scala/org/apache/comet/serde/math.scala (1)

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (5)

convert (659-697)

convert (1883-1887)

exprToProtoInternal (654-834)

scalarFunctionExprToProtoWithReturnType (907-919)

optExprWithInfo (1602-1613)

native/spark-expr/src/math_funcs/abs.rs (1)

native/spark-expr/src/lib.rs (1)

arithmetic_overflow_error (110-114)

native/spark-expr/src/comet_scalar_funcs.rs (1)

native/spark-expr/src/math_funcs/abs.rs (1)

abs (67-304)

native/spark-expr/src/math_funcs/mod.rs (1)

native/spark-expr/src/math_funcs/abs.rs (1)

abs (67-304)

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (8)

makeParquetFileAllPrimitiveTypes (459-461)

makeParquetFileAllPrimitiveTypes (526-615)

withParquetTable (390-398)

withParquetTable (400-403)

withParquetTable (405-408)

checkSparkAnswerAndOperator (169-171)

checkSparkAnswerAndOperator (173-177)

checkSparkAnswerAndOperator (179-186)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Cursor Bugbot

🔇 Additional comments (10)

spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala (1)

137-137: LGTM!

The change from longBitPosition to intBitPosition correctly aligns with the data type of column _11, which is INT_32 (32-bit integer) in the Parquet schema.

docs/source/user-guide/latest/expressions.md (1)

121-121: LGTM!

Documentation correctly reflects the new Abs expression support with Spark compatibility.

docs/source/user-guide/latest/configs.md (1)

153-153: LGTM!

Configuration entry follows the established pattern for expression toggles and correctly defaults to true.

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

22-22: LGTM!

Import addition is necessary for the CometAbs implementation below.

147-164: LGTM!

The CometAbs implementation correctly follows the established pattern for math expressions with failOnError support, similar to CometUnhex. The serialization logic properly handles the child expression and return type.

native/spark-expr/src/comet_scalar_funcs.rs (2)

19-19: LGTM!

Import statement correctly brings in the abs function from the newly added math_funcs module.

184-187: LGTM!

The abs function registration follows the established pattern for scalar UDFs without explicit data type specification, consistent with similar functions like spark_modulo.

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

140-141: LGTM!

The mapping correctly wires Abs expression to the CometAbs serialization handler, following the established pattern for math expressions.

native/spark-expr/src/math_funcs/mod.rs (1)

18-18: LGTM!

Module declaration correctly exposes the abs module with appropriate crate-level visibility, consistent with other math function modules.

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (1)

507-510: The original review comment is incorrect and should be disregarded.

The schema changes are intentional and correct. The context shows this is in the getPrimitiveTypesParquetSchema function (line 463) which has two conditional branches:

TRUE branch (incompatible types enabled): Uses UINT_32 for fields _9-_11

FALSE branch (incompatible types disabled): Uses INT_8, INT_16, INT_32, INT_64 for fields _9-_12

The INT schema accurately reflects the actual data types generated in makeParquetFileAllPrimitiveTypes: the code generates (-i).toByte, (-i).toShort, -i, and (-i).toLong (lines 574-577), which are 8-bit, 16-bit, 32-bit, and 64-bit signed values respectively. The schema correctly maps these to INT_8, INT_16, INT_32, and INT_64.

The comments at lines 464-468 document that this distinction addresses incompatible type behavior differences in Parquet spec handling across implementations. The "width reduction" is not a truncation problem but rather proper bitwidth assignment matching the actual data types.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-11-04T14:09:23Z

native/spark-expr/src/math_funcs/abs.rs

+            ScalarValue::Int8(a) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(abs_val)))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(v))))
+                        } else {
+                            Err(arithmetic_overflow_error("Int8").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            ScalarValue::Int16(a) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(abs_val)))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(v))))
+                        } else {
+                            Err(arithmetic_overflow_error("Int16").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            ScalarValue::Int32(a) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v))))
+                        } else {
+                            Err(arithmetic_overflow_error("Int32").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            ScalarValue::Int64(a) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(abs_val)))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(v))))
+                        } else {
+                            Err(arithmetic_overflow_error("Int64").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            ScalarValue::Float32(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float32(
+                a.map(|x| x.abs()),
+            ))),
+            ScalarValue::Float64(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float64(
+                a.map(|x| x.abs()),
+            ))),
+            ScalarValue::Decimal128(a, precision, scale) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
+                        Some(abs_val),
+                        *precision,
+                        *scale,
+                    ))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
+                                Some(v),
+                                *precision,
+                                *scale,
+                            )))
+                        } else {
+                            Err(arithmetic_overflow_error("Decimal128").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            ScalarValue::Decimal256(a, precision, scale) => a
+                .map(|v| match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal256(
+                        Some(abs_val),
+                        *precision,
+                        *scale,
+                    ))),
+                    None => {
+                        if !fail_on_error {
+                            // return the original value
+                            Ok(ColumnarValue::Scalar(ScalarValue::Decimal256(
+                                Some(v),
+                                *precision,
+                                *scale,
+                            )))
+                        } else {
+                            Err(arithmetic_overflow_error("Decimal256").into())
+                        }
+                    }
+                })
+                .unwrap(),
+            dt => exec_err!("Not supported datatype for ABS: {dt}"),
+        },


⚠️ Potential issue | 🔴 Critical

Prevent NULL scalar panic in abs
ScalarValue::Int* and decimal branches call .map(...).unwrap(). When the scalar is NULL (e.g., ScalarValue::Int32(None) from ABS(CAST(NULL AS INT))), map returns None and the unwrap() panics, crashing the query instead of propagating NULL. Please handle the None case explicitly and return a NULL ScalarValue, keeping the overflow logic for concrete values.

- ScalarValue::Int32(a) => a - .map(|v| match v.checked_abs() { - Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))), - None => { - if !fail_on_error { - Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v)))) - } else { - Err(arithmetic_overflow_error("Int32").into()) - } - } - }) - .unwrap(), + ScalarValue::Int32(a) => match a { + Some(v) => match v.checked_abs() { + Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))), + None if !fail_on_error => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v)))), + None => Err(arithmetic_overflow_error("Int32").into()), + }, + None => Ok(ColumnarValue::Scalar(ScalarValue::Int32(None))), + },

Apply the same pattern to the Int8/Int16/Int64 and Decimal128/Decimal256 branches.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In native/spark-expr/src/math_funcs/abs.rs around lines 201 to 302, the Int8/Int16/Int32/Int64 and Decimal128/Decimal256 scalar branches call .map(...).unwrap(), which panics when the ScalarValue is NULL; change each branch to explicitly handle the None case by returning Ok(ColumnarValue::Scalar(ScalarValue::<Type>(None))) for NULL inputs, and only run the checked_abs / overflow logic for Some(v) (preserving the existing fail_on_error behavior and returning the original Some(v) when not failing). Ensure Decimal branches include the original precision and scale when returning None or the original value.

value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))

@martin-augment, thank you for confirming the issue and clarifying the expected behavior! I've noted that NULL values for signed integers and decimals should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking. This will help ensure correct NULL handling in future reviews.

🐰✨

✏️ Learnings added

Learnt from: martin-augment Repo: martin-augment/datafusion-comet PR: 7 File: native/spark-expr/src/math_funcs/abs.rs:201-302 Timestamp: 2025-11-04T14:26:48.731Z Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

martin-augment · 2025-11-04T14:14:09Z

bugbot review

martin-augment · 2025-11-04T14:15:27Z

The getSupportLevel methods for CometTruncDate and CometTruncTimestamp no longer validate truncation format strings. This allows unsupported formats to reach the native layer, potentially causing runtime errors instead of being caught during query planning.

spark/src/main/scala/org/apache/comet/serde/datetime.scala#L257-L286

value:annoying; category:bug; feedback:The reviewed file is not part of the changed files in this PR.

I've rebased the PR and re-requested a new review by Bugbot

This comment was marked as outdated.

Sign in to view

augmentcode bot reviewed Nov 4, 2025

View reviewed changes

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

hsiang-c added 8 commits November 4, 2025 16:13

Removed old ABS implementation

bf698ef

Define Comet's ABS in Scala

e81e284

Implement Comet's ABS in rust

0b5929e

Enable ABS tests in legacy/ANSI mode

9ddcdef

Fix bit position b/c schema change in CometTestBase

c705486

Update docs

621642a

Fix style

e5e6fff

Update doc

400d60f

martin-augment force-pushed the pr-2595-2025-11-04-14-04-12 branch from 1ddff98 to 400d60f Compare November 4, 2025 14:13

This comment was marked as outdated.

Sign in to view

2595: fix: re-enable Comet abs #7

Are you sure you want to change the base?

2595: fix: re-enable Comet abs #7

Uh oh!

Conversation

martin-augment commented Nov 4, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Uh oh!

This comment was marked as outdated.

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Nov 4, 2025

Uh oh!

martin-augment commented Nov 4, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2595: fix: re-enable Comet `abs` #7

2595: fix: re-enable Comet `abs` #7

martin-augment commented Nov 4, 2025 •

edited by cursor bot

Loading

coderabbitai bot commented Nov 4, 2025 •

edited

Loading

coderabbitai bot Nov 4, 2025 •

edited

Loading