Skip to content

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 4, 2025

2595: To review by AI


Note

Re-enables Comet abs via a native UDF, adds serde/wiring, updates docs, and introduces comprehensive tests including ANSI overflow; also tweaks related tests.

  • Expressions / Engine:
    • Implement native abs UDF in Rust (math_funcs/abs.rs) with ANSI overflow handling and decimals support; register in create_comet_physical_fun.
    • Add Scala serde CometAbs and map Abs in QueryPlanSerde to scalar func abs with failOnError arg.
    • Remove obsolete protobuf Abs message/field; rely on generic scalar function path.
  • Docs:
    • Add Abs to supported Math expressions and enablement table in configs.md and expressions.md.
  • Tests:
    • Enable and expand abs tests, including ANSI overflow and legacy modes.
    • Minor test adjustments: fix bit_get index usage; adjust test parquet schema types for numeric columns.

Written by Cursor Bugbot for commit 400d60f. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds comprehensive support for the Abs mathematical expression in Comet. Changes include documentation updates, Rust implementation with ANSI overflow handling across multiple data types, Scala serialization support, protobuf definition removal, and expanded test coverage for regular and overflow scenarios.

Changes

Cohort / File(s) Summary
Documentation Updates
docs/source/user-guide/latest/configs.md, docs/source/user-guide/latest/expressions.md
Added spark.comet.expression.Abs.enabled configuration option with default true. Added Abs to Math Expressions table marked as Spark-Compatible.
Protobuf Definition Removal
native/proto/src/proto/expr.proto
Removed Abs message type and removed abs = 49 field from Expr oneof expr_struct.
Native Rust Core
native/core/src/execution/planner.rs
Removed dedicated Abs expression handling from Spark-to-DataFusion expression mapping; Abs now falls back to generic Not implemented path.
Rust Abs Implementation and Integration
native/spark-expr/src/math_funcs/abs.rs
Added comprehensive Abs function implementation with support for multiple data types, ANSI overflow handling, legacy semantics, and extensive test coverage.
Rust Module Declarations
native/spark-expr/src/math_funcs/mod.rs, native/spark-expr/src/comet_scalar_funcs.rs
Declared new pub(crate) abs module. Added abs function to scalar function registry via make_comet_scalar_udf.
Scala Serialization Support
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala, spark/src/main/scala/org/apache/comet/serde/math.scala
Extended mathExpressions map to include Abs mapping. Imported Abs and implemented CometAbs serializer transforming child expression and failOnError to proto.
Test Updates
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala, spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala, spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala
Converted ignored abs tests to active tests with ANSI and dictionary configuration coverage. Added overflow behavior tests for ANSI and legacy modes. Changed bit_get position reference and updated Parquet schema to use signed integer types (INT_8, INT_16, INT_32, INT_64) instead of unsigned variants.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2595-2025-11-04-14-04-12

Comment @coderabbitai help to get the list of available commands and usage tips.

cursor[bot]

This comment was marked as outdated.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

}
}
})
.unwrap(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a.map(...).unwrap() will panic for typed NULL scalars like ScalarValue::Int8(None). Consider handling None explicitly and returning the corresponding NULL scalar; this also applies to the Int16/Int32/Int64/Decimal128/Decimal256 branches.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5b13c and 1ddff98.

📒 Files selected for processing (12)
  • docs/source/user-guide/latest/configs.md (1 hunks)
  • docs/source/user-guide/latest/expressions.md (1 hunks)
  • native/core/src/execution/planner.rs (0 hunks)
  • native/proto/src/proto/expr.proto (0 hunks)
  • native/spark-expr/src/comet_scalar_funcs.rs (2 hunks)
  • native/spark-expr/src/math_funcs/abs.rs (1 hunks)
  • native/spark-expr/src/math_funcs/mod.rs (1 hunks)
  • spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1 hunks)
  • spark/src/main/scala/org/apache/comet/serde/math.scala (2 hunks)
  • spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala (1 hunks)
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (2 hunks)
  • spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (1 hunks)
💤 Files with no reviewable changes (2)
  • native/proto/src/proto/expr.proto
  • native/core/src/execution/planner.rs
🧰 Additional context used
🧬 Code graph analysis (6)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
spark/src/main/scala/org/apache/comet/serde/math.scala (2)
  • CometUnhex (128-145)
  • CometAbs (147-164)
spark/src/main/scala/org/apache/comet/serde/math.scala (1)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (5)
  • convert (659-697)
  • convert (1883-1887)
  • exprToProtoInternal (654-834)
  • scalarFunctionExprToProtoWithReturnType (907-919)
  • optExprWithInfo (1602-1613)
native/spark-expr/src/math_funcs/abs.rs (1)
native/spark-expr/src/lib.rs (1)
  • arithmetic_overflow_error (110-114)
native/spark-expr/src/comet_scalar_funcs.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
  • abs (67-304)
native/spark-expr/src/math_funcs/mod.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
  • abs (67-304)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (8)
  • makeParquetFileAllPrimitiveTypes (459-461)
  • makeParquetFileAllPrimitiveTypes (526-615)
  • withParquetTable (390-398)
  • withParquetTable (400-403)
  • withParquetTable (405-408)
  • checkSparkAnswerAndOperator (169-171)
  • checkSparkAnswerAndOperator (173-177)
  • checkSparkAnswerAndOperator (179-186)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (10)
spark/src/test/scala/org/apache/comet/CometBitwiseExpressionSuite.scala (1)

137-137: LGTM!

The change from longBitPosition to intBitPosition correctly aligns with the data type of column _11, which is INT_32 (32-bit integer) in the Parquet schema.

docs/source/user-guide/latest/expressions.md (1)

121-121: LGTM!

Documentation correctly reflects the new Abs expression support with Spark compatibility.

docs/source/user-guide/latest/configs.md (1)

153-153: LGTM!

Configuration entry follows the established pattern for expression toggles and correctly defaults to true.

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

22-22: LGTM!

Import addition is necessary for the CometAbs implementation below.


147-164: LGTM!

The CometAbs implementation correctly follows the established pattern for math expressions with failOnError support, similar to CometUnhex. The serialization logic properly handles the child expression and return type.

native/spark-expr/src/comet_scalar_funcs.rs (2)

19-19: LGTM!

Import statement correctly brings in the abs function from the newly added math_funcs module.


184-187: LGTM!

The abs function registration follows the established pattern for scalar UDFs without explicit data type specification, consistent with similar functions like spark_modulo.

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

140-141: LGTM!

The mapping correctly wires Abs expression to the CometAbs serialization handler, following the established pattern for math expressions.

native/spark-expr/src/math_funcs/mod.rs (1)

18-18: LGTM!

Module declaration correctly exposes the abs module with appropriate crate-level visibility, consistent with other math function modules.

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (1)

507-510: The original review comment is incorrect and should be disregarded.

The schema changes are intentional and correct. The context shows this is in the getPrimitiveTypesParquetSchema function (line 463) which has two conditional branches:

  • TRUE branch (incompatible types enabled): Uses UINT_32 for fields _9-_11
  • FALSE branch (incompatible types disabled): Uses INT_8, INT_16, INT_32, INT_64 for fields _9-_12

The INT schema accurately reflects the actual data types generated in makeParquetFileAllPrimitiveTypes: the code generates (-i).toByte, (-i).toShort, -i, and (-i).toLong (lines 574-577), which are 8-bit, 16-bit, 32-bit, and 64-bit signed values respectively. The schema correctly maps these to INT_8, INT_16, INT_32, and INT_64.

The comments at lines 464-468 document that this distinction addresses incompatible type behavior differences in Parquet spec handling across implementations. The "width reduction" is not a truncation problem but rather proper bitwidth assignment matching the actual data types.

Likely an incorrect or invalid review comment.

Comment on lines +201 to +302
ScalarValue::Int8(a) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(abs_val)))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(v))))
} else {
Err(arithmetic_overflow_error("Int8").into())
}
}
})
.unwrap(),
ScalarValue::Int16(a) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(abs_val)))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Int16(Some(v))))
} else {
Err(arithmetic_overflow_error("Int16").into())
}
}
})
.unwrap(),
ScalarValue::Int32(a) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v))))
} else {
Err(arithmetic_overflow_error("Int32").into())
}
}
})
.unwrap(),
ScalarValue::Int64(a) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(abs_val)))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(v))))
} else {
Err(arithmetic_overflow_error("Int64").into())
}
}
})
.unwrap(),
ScalarValue::Float32(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float32(
a.map(|x| x.abs()),
))),
ScalarValue::Float64(a) => Ok(ColumnarValue::Scalar(ScalarValue::Float64(
a.map(|x| x.abs()),
))),
ScalarValue::Decimal128(a, precision, scale) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
Some(abs_val),
*precision,
*scale,
))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
Some(v),
*precision,
*scale,
)))
} else {
Err(arithmetic_overflow_error("Decimal128").into())
}
}
})
.unwrap(),
ScalarValue::Decimal256(a, precision, scale) => a
.map(|v| match v.checked_abs() {
Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Decimal256(
Some(abs_val),
*precision,
*scale,
))),
None => {
if !fail_on_error {
// return the original value
Ok(ColumnarValue::Scalar(ScalarValue::Decimal256(
Some(v),
*precision,
*scale,
)))
} else {
Err(arithmetic_overflow_error("Decimal256").into())
}
}
})
.unwrap(),
dt => exec_err!("Not supported datatype for ABS: {dt}"),
},
Copy link

@coderabbitai coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent NULL scalar panic in abs
ScalarValue::Int* and decimal branches call .map(...).unwrap(). When the scalar is NULL (e.g., ScalarValue::Int32(None) from ABS(CAST(NULL AS INT))), map returns None and the unwrap() panics, crashing the query instead of propagating NULL. Please handle the None case explicitly and return a NULL ScalarValue, keeping the overflow logic for concrete values.

-            ScalarValue::Int32(a) => a
-                .map(|v| match v.checked_abs() {
-                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
-                    None => {
-                        if !fail_on_error {
-                            Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v))))
-                        } else {
-                            Err(arithmetic_overflow_error("Int32").into())
-                        }
-                    }
-                })
-                .unwrap(),
+            ScalarValue::Int32(a) => match a {
+                Some(v) => match v.checked_abs() {
+                    Some(abs_val) => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(abs_val)))),
+                    None if !fail_on_error => Ok(ColumnarValue::Scalar(ScalarValue::Int32(Some(v)))),
+                    None => Err(arithmetic_overflow_error("Int32").into()),
+                },
+                None => Ok(ColumnarValue::Scalar(ScalarValue::Int32(None))),
+            },

Apply the same pattern to the Int8/Int16/Int64 and Decimal128/Decimal256 branches.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In native/spark-expr/src/math_funcs/abs.rs around lines 201 to 302, the
Int8/Int16/Int32/Int64 and Decimal128/Decimal256 scalar branches call
.map(...).unwrap(), which panics when the ScalarValue is NULL; change each
branch to explicitly handle the None case by returning
Ok(ColumnarValue::Scalar(ScalarValue::<Type>(None))) for NULL inputs, and only
run the checked_abs / overflow logic for Some(v) (preserving the existing
fail_on_error behavior and returning the original Some(v) when not failing).
Ensure Decimal branches include the original precision and scale when returning
None or the original value.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that a NULL value for signed integers and decimals should not lead to panics. Instead it should return the argument, e.g. ColumnarValue::Scalar(ScalarValue::Int8(None))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming the issue and clarifying the expected behavior! I've noted that NULL values for signed integers and decimals should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking. This will help ensure correct NULL handling in future reviews.

🐰✨


✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.731Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@martin-augment martin-augment force-pushed the pr-2595-2025-11-04-14-04-12 branch from 1ddff98 to 400d60f Compare November 4, 2025 14:13
@martin-augment
Copy link
Owner Author

bugbot review

@martin-augment
Copy link
Owner Author

The getSupportLevel methods for CometTruncDate and CometTruncTimestamp no longer validate truncation format strings. This allows unsupported formats to reach the native layer, potentially causing runtime errors instead of being caught during query planning.

spark/src/main/scala/org/apache/comet/serde/datetime.scala#L257-L286

value:annoying; category:bug; feedback:The reviewed file is not part of the changed files in this PR.

I've rebased the PR and re-requested a new review by Bugbot

cursor[bot]

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants