2689: feat: Add support for `abs` #8

martin-augment · 2025-11-05T09:14:30Z

2689: To review by AI

Note

Implements Spark-compatible abs across JVM serde and native execution with ANSI/legacy behavior, updates docs/configs, and adds a dedicated test suite and CI entries.

Expressions/Serde (Scala):
- Add CometAbs and map Spark Abs -> native scalar func, carrying failOnError for ANSI vs legacy.
- Register Abs in QueryPlanSerde math expressions.
Native (Rust):
- Implement abs UDF (native/spark-expr/src/math_funcs/abs.rs) supporting ints/floats/decimals with overflow handling (ANSI) and wrapping behavior (legacy).
- Expose via create_comet_physical_fun and module math_funcs::abs.
- Clean up unused proto/planner bits (remove Abs message; delete commented planner code).
Docs:
- Add Abs to supported math expressions and enable/disable config table.
Tests/CI:
- New CometMathExpressionSuite covering abs (including negative zero, ANSI mode); enable suite in Linux/macOS workflows.

^{Written by Cursor Bugbot for commit 1149438. This will update automatically on new commits. Configure here.}

coderabbitai · 2025-11-05T09:14:44Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR adds comprehensive support for the Abs (absolute value) expression in Comet, including native DataFusion implementation, Spark expression serialization, configuration documentation, and dedicated test coverage while reorganizing test suite structure.

Changes

Cohort / File(s)	Summary
CI/CD Workflow Updates `.github/workflows/pr_build_linux.yml`, `.github/workflows/pr_build_macos.yml`	Added `org.apache.comet.CometMathExpressionSuite` to the Exec test suite list in Linux and macOS PR build workflows.
Documentation `docs/source/user-guide/latest/configs.md`, `docs/source/user-guide/latest/expressions.md`	Added new configuration entry `spark.comet.expression.Abs.enabled` with description and default value true; added Abs row to Math Expressions table with SQL representation `abs`.
Fuzz Testing `fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala`	Made `numFiles` option required for `generateQueries` and `run` subcommands; added new boolean option `showFailedSparkQueries` to `run` subcommand and updated `QueryRunner.runQueries` call to pass this option.
Native Proto `native/proto/src/proto/expr.proto`	Removed `Abs` oneof variant and message definition from Expr proto.
Native Expression Planner `native/core/src/execution/planner.rs`	Removed (commented-out) Abs expression handling from `create_expr`.
Native Scalar Function Registry `native/spark-expr/src/comet_scalar_funcs.rs`	Added "abs" function to the Comet UDF registry with corresponding `ABS` function dispatch.
Native Abs Implementation `native/spark-expr/src/math_funcs/abs.rs`	New module implementing comprehensive Abs function for DataFusion with support for signed/unsigned integers, floats, and decimals; includes ANSI and legacy overflow behavior, null handling, and extensive unit tests.
Native Math Modules `native/spark-expr/src/math_funcs/mod.rs`	Added new crate-level module declaration for `abs`.
Spark Expression Serialization `spark/src/main/scala/org/apache/comet/serde/math.scala`	Added `CometAbs` object extending `CometExpressionSerde[Abs]` with support level checking for numeric types and proto conversion with failOnError handling.
Spark Serde Mapping `spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala`	Added `Abs` to `mathExpressions` mapping in expression serde registry.
Spark Test Suite Reorganization `spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala`	Removed three abs-related test cases and two unused imports.
New Math Expression Test Suite `spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala`	New test suite with tests for abs functionality across numeric types, negative zero handling, and ANSI mode overflow behavior; includes private data generation utility.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-2689-2025-11-05-09-14-26

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-05T09:16:32Z

Pull Request Review: Add support for `abs` expression

Summary

This PR adds native support for the abs (absolute value) expression in DataFusion Comet, including proper handling of ANSI mode overflow behavior, comprehensive tests, and documentation updates. Overall, this is a well-structured and high-quality implementation that follows the project's conventions.

✅ Strengths

Code Quality

Comprehensive Implementation: The Rust implementation in abs.rs handles all numeric types (Int8/16/32/64, UInt*, Float32/64, Decimal128/256) with proper ANSI and legacy mode support.
Proper Error Handling: Correctly implements overflow detection for ANSI mode using checked_abs() for scalars and try_unary for arrays.
Well-Tested: Extensive unit tests cover:
- All numeric types (signed, unsigned, floats, decimals)
- Both scalar and array inputs
- NULL handling
- Edge cases (MIN values, negative zero)
- Both ANSI and legacy modes
Follows Project Conventions:
- Uses the scalar UDF registration pattern (create_comet_physical_fun_with_eval_mode)
- Follows the CometExpressionSerde pattern in Scala
- Properly removes the unused protobuf definition
Documentation: Updates both user-facing docs (expressions.md, configs.md) and CI configuration.

Architecture

Clean Separation: Properly separates Spark serde logic (Scala) from execution logic (Rust)
Reusable Macros: Uses legacy_compute_op and ansi_compute_op macros to reduce code duplication
Integration: Adds CometMathExpressionSuite as a dedicated test suite, improving organization

🔍 Potential Issues & Suggestions

1. Legacy Mode Behavior May Be Unintuitive ⚠️

Location: native/spark-expr/src/math_funcs/abs.rs:206-208

In legacy mode, when abs(i8::MIN) overflows, the code returns the original negative value unchanged:

if !fail_on_error {
    // return the original value
    Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(*v))))
}

Question: Is this the correct Spark behavior? This means abs(-128) would return -128 in legacy mode. While this matches the wrapping behavior, it might be worth:

Adding a comment explaining this is Spark's intended behavior
Verifying this matches Spark 3.x/4.x behavior with a reference

2. Inconsistent Use of `wrapping_abs` vs `abs`

Location: abs.rs:95, 148, 178

For arrays: Uses wrapping_abs for Int types but abs for Decimal128
For Decimal256: Uses wrapping_abs with a comment "i256 doesn't define abs() method"

Recommendation: Add a comment explaining when to use wrapping_abs vs abs to help future maintainers understand the distinction.

3. Test Coverage: Missing Regular Positive/Negative Values

Location: abs.rs tests

The unit tests excellently cover edge cases (MIN, MAX, NULL) but don't have simple sanity checks like:

abs(-5) → 5
abs(5) → 5
abs(-123.45) → 123.45

Recommendation: Add a few basic smoke tests to complement the edge case tests.

4. Documentation: Missing DayTimeInterval/YearMonthInterval

Location: spark/src/main/scala/org/apache/comet/serde/math.scala:154-155

The comment says:

// Spark supports NumericType, DayTimeIntervalType, and YearMonthIntervalType
Unsupported(Some("Only integral, floating-point, and decimal types are supported"))

Recommendation: Consider documenting in the PR description or docs why interval types are not supported yet, or if they're planned for future work.

5. Fuzz Testing Changes Are Unrelated

Location: fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala

The changes to make numFiles required and add showFailedSparkQueries flag are unrelated to the abs feature.

Recommendation: These should ideally be in a separate PR. However, if they're needed for testing, that should be mentioned in the PR description.

6. Minor: Old Commented Code Removal

Location: native/core/src/execution/planner.rs:677-688

Good job removing the old commented-out Abs implementation! This keeps the codebase clean.

🔒 Security Considerations

✅ No security concerns identified:

No unsafe code blocks
Proper bounds checking via checked_abs()
No external dependencies added
Overflow behavior is well-defined and tested

⚡ Performance Considerations

✅ Performance looks good:

Uses Arrow's optimized unary and try_unary kernels for vectorized operations
Zero-copy for unsigned types (returns clone of input)
Minimal branching in hot paths

Potential Optimization: For unsigned types, could return a reference instead of cloning, but this is a micro-optimization and may not be worth the complexity.

🧪 Test Coverage

✅ Excellent test coverage:

✅ Rust unit tests (12 test functions covering all types and modes)
✅ Scala integration tests (CometMathExpressionSuite)
✅ CI integration (added to Linux and macOS workflows)
✅ Both ANSI and legacy modes tested
✅ Edge cases: MIN values, negative zero, NULLs

Minor Gap: As mentioned above, add simple positive/negative value tests.

📋 Code Style & Best Practices

✅ Follows project conventions:

Apache 2.0 license headers present
Consistent formatting
Proper use of Result types
Good error messages (includes type name in overflow errors)

🎯 Recommendations Summary

Must Fix (Before Merge)

None - the PR is production-ready as-is.

Should Consider

Add comment explaining legacy mode overflow behavior matches Spark
Add simple positive/negative value sanity tests
Document why interval types are unsupported (or create a follow-up issue)

Nice to Have

Explain fuzz testing changes in PR description or split to separate PR
Add comments about wrapping_abs vs abs usage

✅ Final Verdict

LGTM with minor suggestions! This is a solid implementation that demonstrates:

Deep understanding of both Spark and DataFusion Comet internals
Attention to edge cases and testing
Proper handling of ANSI compliance
Good software engineering practices

The implementation correctly removes the old protobuf-based approach and adopts the newer scalar UDF pattern, which is the right direction for the project.

Great work! 🎉

Review conducted with focus on code quality, correctness, performance, security, and test coverage per repository standards.

augmentcode

Review completed. 2 suggestions posted.

augmentcode · 2025-11-05T09:21:25Z

spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala

+        f.dataType == DataTypes.FloatType || f.dataType == DataTypes.DoubleType)) {
+      val col = field.name
+      checkSparkAnswerAndOperator(
+        s"SELECT $col, abs($col) FROM tbl WHERE signum($col) < 0 ORDER BY $col")


The filter signum($col) < 0 will not include negative zero (−0.0), so this test may miss verifying abs behavior for negative zero. Consider adjusting the predicate to actually select negative zeros so the intended case is exercised.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

value:useful; category:bug; feedback:The AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test.

augmentcode · 2025-11-05T09:21:25Z

native/spark-expr/src/math_funcs/abs.rs

+                    Ok(())
+                }
+                Err(e) => {
+                    if fail_on_error {


For unsigned types, abs cannot overflow; accepting an error when fail_on_error is true can mask regressions. It would be safer for this test to assert success in both modes for UInt8.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (1)
45-49: Fix the negative-zero predicate

signum(-0.0) evaluates to -0.0, and the comparison -0.0 < 0 is false, so this filter excludes every negative-zero row. As a result the “negative zero” test never exercises the intended inputs. Please change the predicate to one that positively identifies negative zero (for example, constrain to zero and check the reciprocal’s sign, such as $col = 0 AND 1 / $col < 0), or otherwise inject deterministic -0.0 rows that you validate explicitly.

A concrete option:
checkSparkAnswerAndOperator(
  s"SELECT $col, abs($col) FROM tbl " +
    s"WHERE $col = 0 AND 1 / $col < 0 ORDER BY $col")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5b13c and 7fcfdd3.

📒 Files selected for processing (14)

.github/workflows/pr_build_linux.yml (1 hunks)
.github/workflows/pr_build_macos.yml (1 hunks)
docs/source/user-guide/latest/configs.md (1 hunks)
docs/source/user-guide/latest/expressions.md (1 hunks)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (3 hunks)
native/core/src/execution/planner.rs (0 hunks)
native/proto/src/proto/expr.proto (0 hunks)
native/spark-expr/src/comet_scalar_funcs.rs (2 hunks)
native/spark-expr/src/math_funcs/abs.rs (1 hunks)
native/spark-expr/src/math_funcs/mod.rs (1 hunks)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1 hunks)
spark/src/main/scala/org/apache/comet/serde/math.scala (2 hunks)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (0 hunks)
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (1 hunks)

💤 Files with no reviewable changes (3)

native/core/src/execution/planner.rs
native/proto/src/proto/expr.proto
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

📚 Learning: 2025-11-04T14:26:48.750Z

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

native/spark-expr/src/math_funcs/abs.rs
spark/src/main/scala/org/apache/comet/serde/math.scala
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala
native/spark-expr/src/math_funcs/mod.rs
native/spark-expr/src/comet_scalar_funcs.rs

🧬 Code graph analysis (7)

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

CometUnhex (128-145)

CometAbs (147-175)

native/spark-expr/src/math_funcs/abs.rs (1)

native/spark-expr/src/lib.rs (1)

arithmetic_overflow_error (110-114)

fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)

fuzz-testing/src/main/scala/org/apache/comet/fuzz/QueryRunner.scala (2)

QueryRunner (31-145)

runQueries (42-144)

spark/src/main/scala/org/apache/comet/serde/math.scala (1)

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (8)

getSupportLevel (1783-1783)

Compatible (1714-1714)

Unsupported (1725-1725)

convert (661-699)

convert (1848-1852)

exprToProtoInternal (656-809)

scalarFunctionExprToProtoWithReturnType (882-894)

optExprWithInfo (1567-1578)

spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (2)

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (6)

spark (422-422)

CometTestBase (55-1268)

checkSparkAnswerAndOperator (189-193)

checkSparkAnswerAndOperator (199-206)

checkSparkAnswerAndOperator (214-223)

checkSparkMaybeThrows (302-318)

spark/src/main/scala/org/apache/comet/testing/FuzzDataGenerator.scala (3)

DataGenOptions (257-262)

FuzzDataGenerator (35-231)

generateDataFrame (88-105)

native/spark-expr/src/math_funcs/mod.rs (1)

native/spark-expr/src/math_funcs/abs.rs (1)

abs (67-310)

native/spark-expr/src/comet_scalar_funcs.rs (1)

native/spark-expr/src/math_funcs/abs.rs (1)

abs (67-310)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: codex

🔇 Additional comments (11)

fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)

50-50: LGTM!

Making numFiles required is appropriate since it's always used without default handling (line 109). This improves API clarity.

.github/workflows/pr_build_macos.yml (1)

102-102: LGTM! Test suite properly added to CI.

The addition of CometMathExpressionSuite to the exec test suite ensures the new Abs functionality will be tested on macOS builds.

docs/source/user-guide/latest/expressions.md (1)

121-121: LGTM! Documentation entry is complete and consistent.

The Abs entry correctly documents it as a Spark-compatible expression with the abs SQL function, matching the format of other math expression entries.

docs/source/user-guide/latest/configs.md (1)

167-167: LGTM! Configuration entry is properly documented.

The spark.comet.expression.Abs.enabled configuration follows the standard pattern and is correctly positioned alphabetically in the scalar expressions table.

native/spark-expr/src/math_funcs/mod.rs (1)

18-18: LGTM! Module declaration follows established patterns.

The abs module is correctly declared with pub(crate) visibility, consistent with other math function modules like hex and unhex.

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

140-141: LGTM! Expression mapping is correctly configured.

The mapping from classOf[Abs] to CometAbs follows the established pattern and is appropriately positioned in the mathExpressions map.

native/spark-expr/src/comet_scalar_funcs.rs (2)

19-19: LGTM! Import correctly references the abs function.

The import brings in the abs function from the new math_funcs::abs module for registration in the UDF system.

184-187: LGTM! Function registration follows the correct pattern.

The abs function is properly registered using the without data_type macro pattern, consistent with other dynamic-type functions like unhex. This allows the function to handle multiple numeric types at runtime.

.github/workflows/pr_build_linux.yml (1)

137-137: LGTM! Test coverage added to Linux CI.

The CometMathExpressionSuite is correctly added to the exec test suite, ensuring Abs functionality is tested across all CI platforms.

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

22-23: LGTM! Imports are appropriate and complete.

The updated imports correctly add Abs for the new expression type and NumericType for type checking in getSupportLevel.

147-175: LGTM! CometAbs implementation is well-structured and consistent.

The implementation follows established patterns perfectly:

Type safety: getSupportLevel properly validates that only NumericType is supported, providing a clear error message for unsupported interval types

Serialization pattern: The convert method mirrors CometUnhex exactly, using failOnErrorExpr as a function argument while setting the builder's failOnError to false

Error tracking: Uses optExprWithInfo to ensure proper fallback logging

This maintains consistency with the codebase's approach to ANSI-mode-aware math operations.

Based on learnings

coderabbitai · 2025-11-05T09:22:50Z

fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala

+      opt[Int](required = true, descr = "Number of input files to use")
+    val showFailedSparkQueries: ScallopOption[Boolean] =
+      opt[Boolean](required = false, descr = "Whether to show failed Spark queries")


⚠️ Potential issue | 🔴 Critical

Critical: Optional parameter will cause runtime exception if not provided.

Making numFiles required is good (line 61). However, since showFailedSparkQueries is marked required = false, calling it with () at line 116 will throw an exception when the user doesn't provide the flag.

Apply this diff to handle the optional parameter correctly:

case Some(conf.runQueries) => QueryRunner.runQueries( spark, conf.runQueries.numFiles(), conf.runQueries.filename(), - conf.runQueries.showFailedSparkQueries()) + conf.runQueries.showFailedSparkQueries.getOrElse(false))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala around lines 61-63 and usage at line 116, the optional ScallopOption showFailedSparkQueries is defined with required = false but later invoked like a required option which throws at runtime if the flag isn't provided; change the usage to read the option safely (e.g., use getOrElse/ toOption with a default false) so the code supplies a default boolean when the flag is absent, ensuring no exception is thrown.

value:annoying; category:bug; feedback:This file is not part of the Pull Request. It should not be reviewed.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if the file has changes in the current PR branch compared to main git diff main...HEAD -- fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala

Unrecoverable error: Failed to clone repository

@martin-augment, I apologize for reviewing this file! You're right that these changes to fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala appear unrelated to adding abs support and shouldn't be part of this PR.

To clean this up, you can:

Revert the changes to this file in your PR branch

If you need these showFailedSparkQueries changes, consider creating a separate PR for them

I'll skip reviewing this file going forward. Thank you for the clarification!

🧠 Learnings used

Learnt from: martin-augment Repo: martin-augment/datafusion-comet PR: 7 File: native/spark-expr/src/math_funcs/abs.rs:201-302 Timestamp: 2025-11-04T14:26:48.750Z Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

coderabbitai · 2025-11-05T09:22:50Z

native/spark-expr/src/math_funcs/abs.rs

+    fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) {
+        for fail_on_error in [true, false] {
+            let _ = test_fn(fail_on_error);
+        }
+    }


⚠️ Potential issue | 🟠 Major

Ensure the helper propagates failures

with_fail_on_error currently drops the Result from each invocation, so any Err produced by the closure (for example via ? when an assertion fails) is silently ignored and the test still passes. Please make the helper surface the failure—either by returning a Result<()> and propagating with ?, or by explicitly unwrapping/expecting the outcome—so these tests fail when the inner assertions fail.

A minimal fix could look like:

-fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) { - for fail_on_error in [true, false] { - let _ = test_fn(fail_on_error); - } -} +fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) -> Result<()> { + for fail_on_error in [true, false] { + test_fn(fail_on_error)?; + } + Ok(()) +}

(then update the call sites to handle the returned Result).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In native/spark-expr/src/math_funcs/abs.rs around lines 320 to 324, with_fail_on_error currently discards the Result from each closure call so any Err is ignored; change the helper to return Result<()> and propagate errors (call test_fn(fail_on_error)? inside the loop and return Ok(()) at the end) or alternatively assert by calling .unwrap()/.expect() on each call, and update all call sites to handle the returned Result (use ? in test functions or unwrap/expect) so failing assertions surface correctly.

value:useful; category:bug; feedback:The AI reviewer is correct that the Result is ignored and the test always passes. The result of test_fn() execution should be checked for Err to fix this bug.

@martin-augment, thank you for confirming! You're absolutely right that the Result being ignored means these tests can silently pass even when assertions fail, which defeats the purpose of testing both the fail_on_error = true and fail_on_error = false code paths.

Would you like me to generate the code changes to fix this issue? I can update the with_fail_on_error helper to properly propagate errors and modify all the test functions that use it to handle the returned Result. Just let me know!

🧠 Learnings used

Learnt from: martin-augment Repo: martin-augment/datafusion-comet PR: 7 File: native/spark-expr/src/math_funcs/abs.rs:201-302 Timestamp: 2025-11-04T14:26:48.750Z Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

github-actions · 2025-11-05T09:31:18Z

Findings:

None blocking. The Rust implementation and Spark integration for abs look consistent with Spark’s semantics in both legacy and ANSI modes.

Suggestions:

native/spark-expr/src/math_funcs/abs.rs:67 – Consider short-circuiting the zero-argument case before indexing into args[0] so we surface a friendly “abs expects at least one argument” error instead of panicking if the function were ever invoked incorrectly.
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala:42 – The filter signum($col) < 0 skips IEEE -0.0, so the negative-zero test never exercises that edge case. Maybe filter on signbit($col) (Spark 4.0 has it) or compare CAST(1/ $col AS ...) < 0 to ensure the dataset really contains -0.0 rows.
.github/workflows/claude-code-review.yml / .github/workflows/claude.yml / .github/workflows/codex-action.yaml – Since these automation workflows disappear in the diff, just double-check that this removal is intentional; otherwise we lose the existing code-review bots.

claude · 2025-11-05T09:45:05Z

PR Review: Add support for `abs` function

I've completed a comprehensive review of this PR. Overall, this is a well-implemented feature with excellent test coverage and proper handling of Spark compatibility concerns. Here's my detailed feedback:

✅ Strengths

1. Code Quality & Best Practices

Excellent dual-mode implementation: The code properly handles both ANSI and legacy modes, matching Spark's behavior
Consistent error handling: Overflow behavior is correctly implemented with wrapping_abs() for legacy mode and checked_abs() for ANSI mode
Good use of macros: The legacy_compute_op! and ansi_compute_op! macros reduce code duplication effectively
Clean separation of concerns: Scala serde logic is separate from Rust execution logic

2. Test Coverage

Comprehensive unit tests: The Rust implementation includes thorough unit tests for all numeric types (Int8, Int16, Int32, Int64, Float32, Float64, Decimal128, Decimal256)
Edge cases covered: Tests include MIN values, MAX values, null handling, and negative zero (special floating-point case)
Both modes tested: Each test runs in both fail_on_error=true (ANSI) and fail_on_error=false (legacy) modes
Integration tests: The Scala test suite uses fuzz testing with 1000 rows across all data types

3. Documentation

Well-documented: Both the user guide expressions table and config table have been updated
Clear comments: The Rust code includes helpful comments explaining Spark compatibility

🔍 Areas for Improvement

1. Potential Bug: Inconsistent Behavior in Legacy Mode (Minor)

Location: native/spark-expr/src/math_funcs/abs.rs:95-99, 206-208

In legacy mode, when abs(MIN_VALUE) is called:

For arrays: The code uses wrapping_abs(), which returns MIN_VALUE (correct)
For scalars: The code returns the original negative value when checked_abs() fails

Issue: In the scalar path at lines 206-208:

if !fail_on_error {
    // return the original value
    Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(*v))))

This returns the original negative value, but wrapping_abs() on arrays returns MIN_VALUE (which happens to equal MIN_VALUE in two's complement, but the semantic is different). While the result is the same for signed integers (abs(MIN) == MIN in wrapping arithmetic), the implementation approach differs.

Recommendation: For consistency, consider using v.wrapping_abs() in the scalar path to match the array implementation semantics.

2. Code Duplication: Repetitive Pattern Across Integer Types

Location: native/spark-expr/src/math_funcs/abs.rs:201-255

The scalar handling code for Int8, Int16, Int32, and Int64 is nearly identical (54 lines repeated 4 times). This could be refactored using a macro similar to the array handling.

Recommendation: Consider creating a macro for scalar integer handling:

macro_rules! handle_int_scalar {
    ($variant:ident, $type:ty, $type_name:expr) => {
        // ... common logic
    };
}

This is a minor issue and not blocking, but would improve maintainability.

3. Documentation: Missing Details on Interval Types

Location: spark/src/main/scala/org/apache/comet/serde/math.scala:154-156

The code correctly notes that Spark supports DayTimeIntervalType and YearMonthIntervalType, but these are unsupported in Comet.

Recommendation: Consider adding a note in the documentation (docs/source/user-guide/latest/expressions.md) about this limitation in the "Compatibility Notes" column if such columns exist, or in a separate compatibility document.

4. Test Coverage: Unsigned Integer Edge Cases

Location: native/spark-expr/src/math_funcs/abs.rs:328-352

The test test_abs_u8_scalar tests u8::MAX, but this is already positive and abs() is a no-op for unsigned types. The test doesn't actually verify that no transformation occurs—it just checks the value is returned.

Recommendation: Add a comment clarifying this test verifies pass-through behavior, or test with both 0 and MAX values to make the intent clearer.

🛡️ Security Considerations

✅ No security concerns identified

Overflow handling is properly implemented with explicit checks
No unsafe code blocks
No potential for buffer overflows or memory safety issues
Error messages don't leak sensitive information

⚡ Performance Considerations

✅ Good performance characteristics

Array operations: Use Arrow's optimized kernels (unary and try_unary)
Scalar operations: Direct computation without unnecessary allocations
Branch prediction: The fail_on_error check is done once per call, not per element

Minor optimization opportunity: The second argument parsing (lines 72-84) happens on every call. If performance becomes critical, consider using a static function variant selection, but this is likely premature optimization.

📊 Cleanup: Old Code Removal

✅ Good cleanup

Old proto definitions removed (native/proto/src/proto/expr.proto)
Commented-out planner code removed (native/core/src/execution/planner.rs)
Old test suite properly replaced (CometExpressionSuite.scala → CometMathExpressionSuite.scala)

✅ Overall Assessment

This is a high-quality PR that:

✅ Correctly implements Spark-compatible abs with both ANSI and legacy modes
✅ Has excellent test coverage including edge cases
✅ Properly updates documentation
✅ Follows the codebase conventions
✅ Cleans up old implementations
⚠️ Has one minor behavioral inconsistency (scalar vs array wrapping) that should be addressed

🎯 Recommendation

Approve with minor suggestions. The one potential bug with scalar wrapping behavior should be verified/fixed, but the core functionality is solid and well-tested. The other suggestions are minor improvements for code quality and maintainability.

Great work on this implementation! 🚀

cursor · 2025-11-05T09:45:05Z

native/spark-expr/src/math_funcs/abs.rs

+                _ => unreachable!(),
+            }
+        });
+    }


Bug: Unsigned absolute value never overflows for unsigned types

The test test_abs_u8_scalar incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

cursor · 2025-11-05T09:45:05Z

native/spark-expr/src/math_funcs/abs.rs

+                _ => unreachable!(),
+            }
+        });
+    }


Bug: Unsigned abs never errors; test is wrong

The test test_abs_u64_array incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

martin-augment · 2025-11-05T09:55:30Z

native/spark-expr/src/math_funcs/abs.rs:67 – Consider short-circuiting the zero-argument case before indexing into args[0] so we surface a friendly “abs expects at least one argument” error instead of panicking if the function were ever invoked incorrectly.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that there is no check that the abs function is called with at least one argument. Prevents a runtime error when trying to read the first argument.

martin-augment · 2025-11-05T10:00:57Z

2. Code Duplication: Repetitive Pattern Across Integer Types

Location: native/spark-expr/src/math_funcs/abs.rs:201-255

The scalar handling code for Int8, Int16, Int32, and Int64 is nearly identical (54 lines repeated 4 times). This could be refactored using a macro similar to the array handling.

Recommendation: Consider creating a macro for scalar integer handling:
macro_rules! handle_int_scalar {
    ($variant:ident, $type:ty, $type_name:expr) => {
        // ... common logic
    };
}
This is a minor issue and not blocking, but would improve maintainability.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that there is code duplication that could be reduced with declarative macros. There is no bug/regression here. It would improve the maintainability.

martin-augment · 2025-11-05T11:31:21Z

spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala:42 – The filter signum($col) < 0 skips IEEE -0.0, so the negative-zero test never exercises that edge case. Maybe filter on signbit($col) (Spark 4.0 has it) or compare CAST(1/ $col AS ...) < 0 to ensure the dataset really contains -0.0 rows.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test.

martin-augment · 2025-11-05T11:32:46Z

.github/workflows/claude-code-review.yml / .github/workflows/claude.yml / .github/workflows/codex-action.yaml – Since these automation workflows disappear in the diff, just double-check that this removal is intentional; otherwise we lose the existing code-review bots.

value:annoying; category:bug; feedback:The Codex AI reviewer is wrong! Those files are not removed in this Pull Request and they would stay even if it is merged.

hsiang-c and others added 16 commits October 16, 2025 14:32

Removed old ABS implementation

58d8c9a

Define Comet's ABS in Scala

d5b70c3

Implement Comet's ABS in rust

7b1965a

Enable ABS tests in legacy/ANSI mode

eb94cb6

Fix bit position b/c schema change in CometTestBase

ce1c1c2

Update docs

9cb2f6c

Fix style

6016f69

Update doc

1ddff98

Merge remote-tracking branch 'apache/main' into enable_abs

945cfc0

new test

3a48453

format

2351dfb

Revert

4952286

negative zero test

c414db9

fix null scalar issue

4236dde

add new test to CI

0c1b172

fix

7fcfdd3

augmentcode bot reviewed Nov 5, 2025

View reviewed changes

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

Merge branch 'main' into pr-2689-2025-11-05-09-14-26

1149438

cursor bot reviewed Nov 5, 2025

View reviewed changes

2689: feat: Add support for abs #8

Are you sure you want to change the base?

2689: feat: Add support for abs #8

Uh oh!

Conversation

martin-augment commented Nov 5, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Uh oh!

claude bot commented Nov 5, 2025

Pull Request Review: Add support for abs expression

Summary

✅ Strengths

Code Quality

Architecture

🔍 Potential Issues & Suggestions

1. Legacy Mode Behavior May Be Unintuitive ⚠️

2. Inconsistent Use of wrapping_abs vs abs

3. Test Coverage: Missing Regular Positive/Negative Values

4. Documentation: Missing DayTimeInterval/YearMonthInterval

5. Fuzz Testing Changes Are Unrelated

6. Minor: Old Commented Code Removal

🔒 Security Considerations

⚡ Performance Considerations

🧪 Test Coverage

📋 Code Style & Best Practices

🎯 Recommendations Summary

Must Fix (Before Merge)

Should Consider

Nice to Have

✅ Final Verdict

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

PR Review: Add support for abs function

✅ Strengths

2689: feat: Add support for `abs` #8

2689: feat: Add support for `abs` #8

martin-augment commented Nov 5, 2025 •

edited by cursor bot

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

Pull Request Review: Add support for `abs` expression

2. Inconsistent Use of `wrapping_abs` vs `abs`

coderabbitai bot Nov 5, 2025 •

edited

Loading

coderabbitai bot Nov 5, 2025 •

edited

Loading

PR Review: Add support for `abs` function