Skip to content

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 5, 2025

2689: To review by AI


Note

Implements Spark-compatible abs across JVM serde and native execution with ANSI/legacy behavior, updates docs/configs, and adds a dedicated test suite and CI entries.

  • Expressions/Serde (Scala):
    • Add CometAbs and map Spark Abs -> native scalar func, carrying failOnError for ANSI vs legacy.
    • Register Abs in QueryPlanSerde math expressions.
  • Native (Rust):
    • Implement abs UDF (native/spark-expr/src/math_funcs/abs.rs) supporting ints/floats/decimals with overflow handling (ANSI) and wrapping behavior (legacy).
    • Expose via create_comet_physical_fun and module math_funcs::abs.
    • Clean up unused proto/planner bits (remove Abs message; delete commented planner code).
  • Docs:
    • Add Abs to supported math expressions and enable/disable config table.
  • Tests/CI:
    • New CometMathExpressionSuite covering abs (including negative zero, ANSI mode); enable suite in Linux/macOS workflows.

Written by Cursor Bugbot for commit 1149438. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR adds comprehensive support for the Abs (absolute value) expression in Comet, including native DataFusion implementation, Spark expression serialization, configuration documentation, and dedicated test coverage while reorganizing test suite structure.

Changes

Cohort / File(s) Summary
CI/CD Workflow Updates
.github/workflows/pr_build_linux.yml, .github/workflows/pr_build_macos.yml
Added org.apache.comet.CometMathExpressionSuite to the Exec test suite list in Linux and macOS PR build workflows.
Documentation
docs/source/user-guide/latest/configs.md, docs/source/user-guide/latest/expressions.md
Added new configuration entry spark.comet.expression.Abs.enabled with description and default value true; added Abs row to Math Expressions table with SQL representation abs.
Fuzz Testing
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala
Made numFiles option required for generateQueries and run subcommands; added new boolean option showFailedSparkQueries to run subcommand and updated QueryRunner.runQueries call to pass this option.
Native Proto
native/proto/src/proto/expr.proto
Removed Abs oneof variant and message definition from Expr proto.
Native Expression Planner
native/core/src/execution/planner.rs
Removed (commented-out) Abs expression handling from create_expr.
Native Scalar Function Registry
native/spark-expr/src/comet_scalar_funcs.rs
Added "abs" function to the Comet UDF registry with corresponding ABS function dispatch.
Native Abs Implementation
native/spark-expr/src/math_funcs/abs.rs
New module implementing comprehensive Abs function for DataFusion with support for signed/unsigned integers, floats, and decimals; includes ANSI and legacy overflow behavior, null handling, and extensive unit tests.
Native Math Modules
native/spark-expr/src/math_funcs/mod.rs
Added new crate-level module declaration for abs.
Spark Expression Serialization
spark/src/main/scala/org/apache/comet/serde/math.scala
Added CometAbs object extending CometExpressionSerde[Abs] with support level checking for numeric types and proto conversion with failOnError handling.
Spark Serde Mapping
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Added Abs to mathExpressions mapping in expression serde registry.
Spark Test Suite Reorganization
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Removed three abs-related test cases and two unused imports.
New Math Expression Test Suite
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala
New test suite with tests for abs functionality across numeric types, negative zero handling, and ANSI mode overflow behavior; includes private data generation utility.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2689-2025-11-05-09-14-26

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Nov 5, 2025

Pull Request Review: Add support for abs expression

Summary

This PR adds native support for the abs (absolute value) expression in DataFusion Comet, including proper handling of ANSI mode overflow behavior, comprehensive tests, and documentation updates. Overall, this is a well-structured and high-quality implementation that follows the project's conventions.


✅ Strengths

Code Quality

  1. Comprehensive Implementation: The Rust implementation in abs.rs handles all numeric types (Int8/16/32/64, UInt*, Float32/64, Decimal128/256) with proper ANSI and legacy mode support.

  2. Proper Error Handling: Correctly implements overflow detection for ANSI mode using checked_abs() for scalars and try_unary for arrays.

  3. Well-Tested: Extensive unit tests cover:

    • All numeric types (signed, unsigned, floats, decimals)
    • Both scalar and array inputs
    • NULL handling
    • Edge cases (MIN values, negative zero)
    • Both ANSI and legacy modes
  4. Follows Project Conventions:

    • Uses the scalar UDF registration pattern (create_comet_physical_fun_with_eval_mode)
    • Follows the CometExpressionSerde pattern in Scala
    • Properly removes the unused protobuf definition
  5. Documentation: Updates both user-facing docs (expressions.md, configs.md) and CI configuration.

Architecture

  • Clean Separation: Properly separates Spark serde logic (Scala) from execution logic (Rust)
  • Reusable Macros: Uses legacy_compute_op and ansi_compute_op macros to reduce code duplication
  • Integration: Adds CometMathExpressionSuite as a dedicated test suite, improving organization

🔍 Potential Issues & Suggestions

1. Legacy Mode Behavior May Be Unintuitive ⚠️

Location: native/spark-expr/src/math_funcs/abs.rs:206-208

In legacy mode, when abs(i8::MIN) overflows, the code returns the original negative value unchanged:

if !fail_on_error {
    // return the original value
    Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(*v))))
}

Question: Is this the correct Spark behavior? This means abs(-128) would return -128 in legacy mode. While this matches the wrapping behavior, it might be worth:

  • Adding a comment explaining this is Spark's intended behavior
  • Verifying this matches Spark 3.x/4.x behavior with a reference

2. Inconsistent Use of wrapping_abs vs abs

Location: abs.rs:95, 148, 178

  • For arrays: Uses wrapping_abs for Int types but abs for Decimal128
  • For Decimal256: Uses wrapping_abs with a comment "i256 doesn't define abs() method"

Recommendation: Add a comment explaining when to use wrapping_abs vs abs to help future maintainers understand the distinction.

3. Test Coverage: Missing Regular Positive/Negative Values

Location: abs.rs tests

The unit tests excellently cover edge cases (MIN, MAX, NULL) but don't have simple sanity checks like:

  • abs(-5)5
  • abs(5)5
  • abs(-123.45)123.45

Recommendation: Add a few basic smoke tests to complement the edge case tests.

4. Documentation: Missing DayTimeInterval/YearMonthInterval

Location: spark/src/main/scala/org/apache/comet/serde/math.scala:154-155

The comment says:

// Spark supports NumericType, DayTimeIntervalType, and YearMonthIntervalType
Unsupported(Some("Only integral, floating-point, and decimal types are supported"))

Recommendation: Consider documenting in the PR description or docs why interval types are not supported yet, or if they're planned for future work.

5. Fuzz Testing Changes Are Unrelated

Location: fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala

The changes to make numFiles required and add showFailedSparkQueries flag are unrelated to the abs feature.

Recommendation: These should ideally be in a separate PR. However, if they're needed for testing, that should be mentioned in the PR description.

6. Minor: Old Commented Code Removal

Location: native/core/src/execution/planner.rs:677-688

Good job removing the old commented-out Abs implementation! This keeps the codebase clean.


🔒 Security Considerations

✅ No security concerns identified:

  • No unsafe code blocks
  • Proper bounds checking via checked_abs()
  • No external dependencies added
  • Overflow behavior is well-defined and tested

⚡ Performance Considerations

✅ Performance looks good:

  • Uses Arrow's optimized unary and try_unary kernels for vectorized operations
  • Zero-copy for unsigned types (returns clone of input)
  • Minimal branching in hot paths

Potential Optimization: For unsigned types, could return a reference instead of cloning, but this is a micro-optimization and may not be worth the complexity.


🧪 Test Coverage

Excellent test coverage:

  • ✅ Rust unit tests (12 test functions covering all types and modes)
  • ✅ Scala integration tests (CometMathExpressionSuite)
  • ✅ CI integration (added to Linux and macOS workflows)
  • ✅ Both ANSI and legacy modes tested
  • ✅ Edge cases: MIN values, negative zero, NULLs

Minor Gap: As mentioned above, add simple positive/negative value tests.


📋 Code Style & Best Practices

✅ Follows project conventions:

  • Apache 2.0 license headers present
  • Consistent formatting
  • Proper use of Result types
  • Good error messages (includes type name in overflow errors)

🎯 Recommendations Summary

Must Fix (Before Merge)

None - the PR is production-ready as-is.

Should Consider

  1. Add comment explaining legacy mode overflow behavior matches Spark
  2. Add simple positive/negative value sanity tests
  3. Document why interval types are unsupported (or create a follow-up issue)

Nice to Have

  1. Explain fuzz testing changes in PR description or split to separate PR
  2. Add comments about wrapping_abs vs abs usage

✅ Final Verdict

LGTM with minor suggestions! This is a solid implementation that demonstrates:

  • Deep understanding of both Spark and DataFusion Comet internals
  • Attention to edge cases and testing
  • Proper handling of ANSI compliance
  • Good software engineering practices

The implementation correctly removes the old protobuf-based approach and adopts the newer scalar UDF pattern, which is the right direction for the project.

Great work! 🎉


Review conducted with focus on code quality, correctness, performance, security, and test coverage per repository standards.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

f.dataType == DataTypes.FloatType || f.dataType == DataTypes.DoubleType)) {
val col = field.name
checkSparkAnswerAndOperator(
s"SELECT $col, abs($col) FROM tbl WHERE signum($col) < 0 ORDER BY $col")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter signum($col) < 0 will not include negative zero (−0.0), so this test may miss verifying abs behavior for negative zero. Consider adjusting the predicate to actually select negative zeros so the intended case is exercised.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test.

Ok(())
}
Err(e) => {
if fail_on_error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unsigned types, abs cannot overflow; accepting an error when fail_on_error is true can mask regressions. It would be safer for this test to assert success in both modes for UInt8.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (1)

45-49: Fix the negative-zero predicate

signum(-0.0) evaluates to -0.0, and the comparison -0.0 < 0 is false, so this filter excludes every negative-zero row. As a result the “negative zero” test never exercises the intended inputs. Please change the predicate to one that positively identifies negative zero (for example, constrain to zero and check the reciprocal’s sign, such as $col = 0 AND 1 / $col < 0), or otherwise inject deterministic -0.0 rows that you validate explicitly.

A concrete option:

checkSparkAnswerAndOperator(
  s"SELECT $col, abs($col) FROM tbl " +
    s"WHERE $col = 0 AND 1 / $col < 0 ORDER BY $col")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5b13c and 7fcfdd3.

📒 Files selected for processing (14)
  • .github/workflows/pr_build_linux.yml (1 hunks)
  • .github/workflows/pr_build_macos.yml (1 hunks)
  • docs/source/user-guide/latest/configs.md (1 hunks)
  • docs/source/user-guide/latest/expressions.md (1 hunks)
  • fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (3 hunks)
  • native/core/src/execution/planner.rs (0 hunks)
  • native/proto/src/proto/expr.proto (0 hunks)
  • native/spark-expr/src/comet_scalar_funcs.rs (2 hunks)
  • native/spark-expr/src/math_funcs/abs.rs (1 hunks)
  • native/spark-expr/src/math_funcs/mod.rs (1 hunks)
  • spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1 hunks)
  • spark/src/main/scala/org/apache/comet/serde/math.scala (2 hunks)
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (0 hunks)
  • spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (1 hunks)
💤 Files with no reviewable changes (3)
  • native/core/src/execution/planner.rs
  • native/proto/src/proto/expr.proto
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
📚 Learning: 2025-11-04T14:26:48.750Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

  • native/spark-expr/src/math_funcs/abs.rs
  • spark/src/main/scala/org/apache/comet/serde/math.scala
  • spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala
  • native/spark-expr/src/math_funcs/mod.rs
  • native/spark-expr/src/comet_scalar_funcs.rs
🧬 Code graph analysis (7)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
spark/src/main/scala/org/apache/comet/serde/math.scala (2)
  • CometUnhex (128-145)
  • CometAbs (147-175)
native/spark-expr/src/math_funcs/abs.rs (1)
native/spark-expr/src/lib.rs (1)
  • arithmetic_overflow_error (110-114)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/QueryRunner.scala (2)
  • QueryRunner (31-145)
  • runQueries (42-144)
spark/src/main/scala/org/apache/comet/serde/math.scala (1)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (8)
  • getSupportLevel (1783-1783)
  • Compatible (1714-1714)
  • Unsupported (1725-1725)
  • convert (661-699)
  • convert (1848-1852)
  • exprToProtoInternal (656-809)
  • scalarFunctionExprToProtoWithReturnType (882-894)
  • optExprWithInfo (1567-1578)
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (2)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (6)
  • spark (422-422)
  • CometTestBase (55-1268)
  • checkSparkAnswerAndOperator (189-193)
  • checkSparkAnswerAndOperator (199-206)
  • checkSparkAnswerAndOperator (214-223)
  • checkSparkMaybeThrows (302-318)
spark/src/main/scala/org/apache/comet/testing/FuzzDataGenerator.scala (3)
  • DataGenOptions (257-262)
  • FuzzDataGenerator (35-231)
  • generateDataFrame (88-105)
native/spark-expr/src/math_funcs/mod.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
  • abs (67-310)
native/spark-expr/src/comet_scalar_funcs.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
  • abs (67-310)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: claude-review
  • GitHub Check: codex
🔇 Additional comments (11)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)

50-50: LGTM!

Making numFiles required is appropriate since it's always used without default handling (line 109). This improves API clarity.

.github/workflows/pr_build_macos.yml (1)

102-102: LGTM! Test suite properly added to CI.

The addition of CometMathExpressionSuite to the exec test suite ensures the new Abs functionality will be tested on macOS builds.

docs/source/user-guide/latest/expressions.md (1)

121-121: LGTM! Documentation entry is complete and consistent.

The Abs entry correctly documents it as a Spark-compatible expression with the abs SQL function, matching the format of other math expression entries.

docs/source/user-guide/latest/configs.md (1)

167-167: LGTM! Configuration entry is properly documented.

The spark.comet.expression.Abs.enabled configuration follows the standard pattern and is correctly positioned alphabetically in the scalar expressions table.

native/spark-expr/src/math_funcs/mod.rs (1)

18-18: LGTM! Module declaration follows established patterns.

The abs module is correctly declared with pub(crate) visibility, consistent with other math function modules like hex and unhex.

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)

140-141: LGTM! Expression mapping is correctly configured.

The mapping from classOf[Abs] to CometAbs follows the established pattern and is appropriately positioned in the mathExpressions map.

native/spark-expr/src/comet_scalar_funcs.rs (2)

19-19: LGTM! Import correctly references the abs function.

The import brings in the abs function from the new math_funcs::abs module for registration in the UDF system.


184-187: LGTM! Function registration follows the correct pattern.

The abs function is properly registered using the without data_type macro pattern, consistent with other dynamic-type functions like unhex. This allows the function to handle multiple numeric types at runtime.

.github/workflows/pr_build_linux.yml (1)

137-137: LGTM! Test coverage added to Linux CI.

The CometMathExpressionSuite is correctly added to the exec test suite, ensuring Abs functionality is tested across all CI platforms.

spark/src/main/scala/org/apache/comet/serde/math.scala (2)

22-23: LGTM! Imports are appropriate and complete.

The updated imports correctly add Abs for the new expression type and NumericType for type checking in getSupportLevel.


147-175: LGTM! CometAbs implementation is well-structured and consistent.

The implementation follows established patterns perfectly:

  1. Type safety: getSupportLevel properly validates that only NumericType is supported, providing a clear error message for unsupported interval types
  2. Serialization pattern: The convert method mirrors CometUnhex exactly, using failOnErrorExpr as a function argument while setting the builder's failOnError to false
  3. Error tracking: Uses optExprWithInfo to ensure proper fallback logging

This maintains consistency with the codebase's approach to ANSI-mode-aware math operations.

Based on learnings

Comment on lines 61 to 63
opt[Int](required = true, descr = "Number of input files to use")
val showFailedSparkQueries: ScallopOption[Boolean] =
opt[Boolean](required = false, descr = "Whether to show failed Spark queries")
Copy link

@coderabbitai coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Optional parameter will cause runtime exception if not provided.

Making numFiles required is good (line 61). However, since showFailedSparkQueries is marked required = false, calling it with () at line 116 will throw an exception when the user doesn't provide the flag.

Apply this diff to handle the optional parameter correctly:

       case Some(conf.runQueries) =>
         QueryRunner.runQueries(
           spark,
           conf.runQueries.numFiles(),
           conf.runQueries.filename(),
-          conf.runQueries.showFailedSparkQueries())
+          conf.runQueries.showFailedSparkQueries.getOrElse(false))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala around lines
61-63 and usage at line 116, the optional ScallopOption showFailedSparkQueries
is defined with required = false but later invoked like a required option which
throws at runtime if the flag isn't provided; change the usage to read the
option safely (e.g., use getOrElse/ toOption with a default false) so the code
supplies a default boolean when the flag is absent, ensuring no exception is
thrown.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:This file is not part of the Pull Request. It should not be reviewed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the file has changes in the current PR branch compared to main
git diff main...HEAD -- fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala

Unrecoverable error: Failed to clone repository


@martin-augment, I apologize for reviewing this file! You're right that these changes to fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala appear unrelated to adding abs support and shouldn't be part of this PR.

To clean this up, you can:

  1. Revert the changes to this file in your PR branch
  2. If you need these showFailedSparkQueries changes, consider creating a separate PR for them

I'll skip reviewing this file going forward. Thank you for the clarification!


🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Comment on lines +320 to +324
fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) {
for fail_on_error in [true, false] {
let _ = test_fn(fail_on_error);
}
}
Copy link

@coderabbitai coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure the helper propagates failures

with_fail_on_error currently drops the Result from each invocation, so any Err produced by the closure (for example via ? when an assertion fails) is silently ignored and the test still passes. Please make the helper surface the failure—either by returning a Result<()> and propagating with ?, or by explicitly unwrapping/expecting the outcome—so these tests fail when the inner assertions fail.

A minimal fix could look like:

-fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) {
-    for fail_on_error in [true, false] {
-        let _ = test_fn(fail_on_error);
-    }
-}
+fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) -> Result<()> {
+    for fail_on_error in [true, false] {
+        test_fn(fail_on_error)?;
+    }
+    Ok(())
+}

(then update the call sites to handle the returned Result).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In native/spark-expr/src/math_funcs/abs.rs around lines 320 to 324,
with_fail_on_error currently discards the Result from each closure call so any
Err is ignored; change the helper to return Result<()> and propagate errors
(call test_fn(fail_on_error)? inside the loop and return Ok(()) at the end) or
alternatively assert by calling .unwrap()/.expect() on each call, and update all
call sites to handle the returned Result (use ? in test functions or
unwrap/expect) so failing assertions surface correctly.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that the Result is ignored and the test always passes. The result of test_fn() execution should be checked for Err to fix this bug.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming! You're absolutely right that the Result being ignored means these tests can silently pass even when assertions fail, which defeats the purpose of testing both the fail_on_error = true and fail_on_error = false code paths.

Would you like me to generate the code changes to fix this issue? I can update the with_fail_on_error helper to properly propagate errors and modify all the test functions that use it to handle the returned Result. Just let me know!


🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Findings:

  • None blocking. The Rust implementation and Spark integration for abs look consistent with Spark’s semantics in both legacy and ANSI modes.

Suggestions:

  • native/spark-expr/src/math_funcs/abs.rs:67 – Consider short-circuiting the zero-argument case before indexing into args[0] so we surface a friendly “abs expects at least one argument” error instead of panicking if the function were ever invoked incorrectly.
  • spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala:42 – The filter signum($col) < 0 skips IEEE -0.0, so the negative-zero test never exercises that edge case. Maybe filter on signbit($col) (Spark 4.0 has it) or compare CAST(1/ $col AS ...) < 0 to ensure the dataset really contains -0.0 rows.
  • .github/workflows/claude-code-review.yml / .github/workflows/claude.yml / .github/workflows/codex-action.yaml – Since these automation workflows disappear in the diff, just double-check that this removal is intentional; otherwise we lose the existing code-review bots.

@claude
Copy link

claude bot commented Nov 5, 2025

PR Review: Add support for abs function

I've completed a comprehensive review of this PR. Overall, this is a well-implemented feature with excellent test coverage and proper handling of Spark compatibility concerns. Here's my detailed feedback:


Strengths

1. Code Quality & Best Practices

  • Excellent dual-mode implementation: The code properly handles both ANSI and legacy modes, matching Spark's behavior
  • Consistent error handling: Overflow behavior is correctly implemented with wrapping_abs() for legacy mode and checked_abs() for ANSI mode
  • Good use of macros: The legacy_compute_op! and ansi_compute_op! macros reduce code duplication effectively
  • Clean separation of concerns: Scala serde logic is separate from Rust execution logic

2. Test Coverage

  • Comprehensive unit tests: The Rust implementation includes thorough unit tests for all numeric types (Int8, Int16, Int32, Int64, Float32, Float64, Decimal128, Decimal256)
  • Edge cases covered: Tests include MIN values, MAX values, null handling, and negative zero (special floating-point case)
  • Both modes tested: Each test runs in both fail_on_error=true (ANSI) and fail_on_error=false (legacy) modes
  • Integration tests: The Scala test suite uses fuzz testing with 1000 rows across all data types

3. Documentation

  • Well-documented: Both the user guide expressions table and config table have been updated
  • Clear comments: The Rust code includes helpful comments explaining Spark compatibility

🔍 Areas for Improvement

1. Potential Bug: Inconsistent Behavior in Legacy Mode (Minor)

Location: native/spark-expr/src/math_funcs/abs.rs:95-99, 206-208

In legacy mode, when abs(MIN_VALUE) is called:

  • For arrays: The code uses wrapping_abs(), which returns MIN_VALUE (correct)
  • For scalars: The code returns the original negative value when checked_abs() fails

Issue: In the scalar path at lines 206-208:

if !fail_on_error {
    // return the original value
    Ok(ColumnarValue::Scalar(ScalarValue::Int8(Some(*v))))

This returns the original negative value, but wrapping_abs() on arrays returns MIN_VALUE (which happens to equal MIN_VALUE in two's complement, but the semantic is different). While the result is the same for signed integers (abs(MIN) == MIN in wrapping arithmetic), the implementation approach differs.

Recommendation: For consistency, consider using v.wrapping_abs() in the scalar path to match the array implementation semantics.


2. Code Duplication: Repetitive Pattern Across Integer Types

Location: native/spark-expr/src/math_funcs/abs.rs:201-255

The scalar handling code for Int8, Int16, Int32, and Int64 is nearly identical (54 lines repeated 4 times). This could be refactored using a macro similar to the array handling.

Recommendation: Consider creating a macro for scalar integer handling:

macro_rules! handle_int_scalar {
    ($variant:ident, $type:ty, $type_name:expr) => {
        // ... common logic
    };
}

This is a minor issue and not blocking, but would improve maintainability.


3. Documentation: Missing Details on Interval Types

Location: spark/src/main/scala/org/apache/comet/serde/math.scala:154-156

The code correctly notes that Spark supports DayTimeIntervalType and YearMonthIntervalType, but these are unsupported in Comet.

Recommendation: Consider adding a note in the documentation (docs/source/user-guide/latest/expressions.md) about this limitation in the "Compatibility Notes" column if such columns exist, or in a separate compatibility document.


4. Test Coverage: Unsigned Integer Edge Cases

Location: native/spark-expr/src/math_funcs/abs.rs:328-352

The test test_abs_u8_scalar tests u8::MAX, but this is already positive and abs() is a no-op for unsigned types. The test doesn't actually verify that no transformation occurs—it just checks the value is returned.

Recommendation: Add a comment clarifying this test verifies pass-through behavior, or test with both 0 and MAX values to make the intent clearer.


🛡️ Security Considerations

No security concerns identified

  • Overflow handling is properly implemented with explicit checks
  • No unsafe code blocks
  • No potential for buffer overflows or memory safety issues
  • Error messages don't leak sensitive information

Performance Considerations

Good performance characteristics

  • Array operations: Use Arrow's optimized kernels (unary and try_unary)
  • Scalar operations: Direct computation without unnecessary allocations
  • Branch prediction: The fail_on_error check is done once per call, not per element

Minor optimization opportunity: The second argument parsing (lines 72-84) happens on every call. If performance becomes critical, consider using a static function variant selection, but this is likely premature optimization.


📊 Cleanup: Old Code Removal

Good cleanup

  • Old proto definitions removed (native/proto/src/proto/expr.proto)
  • Commented-out planner code removed (native/core/src/execution/planner.rs)
  • Old test suite properly replaced (CometExpressionSuite.scalaCometMathExpressionSuite.scala)

Overall Assessment

This is a high-quality PR that:

  1. ✅ Correctly implements Spark-compatible abs with both ANSI and legacy modes
  2. ✅ Has excellent test coverage including edge cases
  3. ✅ Properly updates documentation
  4. ✅ Follows the codebase conventions
  5. ✅ Cleans up old implementations
  6. ⚠️ Has one minor behavioral inconsistency (scalar vs array wrapping) that should be addressed

🎯 Recommendation

Approve with minor suggestions. The one potential bug with scalar wrapping behavior should be verified/fixed, but the core functionality is solid and well-tested. The other suggestions are minor improvements for code quality and maintainability.

Great work on this implementation! 🚀

_ => unreachable!(),
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unsigned absolute value never overflows for unsigned types

The test test_abs_u8_scalar incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

_ => unreachable!(),
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unsigned abs never errors; test is wrong

The test test_abs_u64_array incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.

@martin-augment
Copy link
Owner Author

  • native/spark-expr/src/math_funcs/abs.rs:67 – Consider short-circuiting the zero-argument case before indexing into args[0] so we surface a friendly “abs expects at least one argument” error instead of panicking if the function were ever invoked incorrectly.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that there is no check that the abs function is called with at least one argument. Prevents a runtime error when trying to read the first argument.

@martin-augment
Copy link
Owner Author

2. Code Duplication: Repetitive Pattern Across Integer Types

Location: native/spark-expr/src/math_funcs/abs.rs:201-255

The scalar handling code for Int8, Int16, Int32, and Int64 is nearly identical (54 lines repeated 4 times). This could be refactored using a macro similar to the array handling.

Recommendation: Consider creating a macro for scalar integer handling:

macro_rules! handle_int_scalar {
    ($variant:ident, $type:ty, $type_name:expr) => {
        // ... common logic
    };
}

This is a minor issue and not blocking, but would improve maintainability.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that there is code duplication that could be reduced with declarative macros. There is no bug/regression here. It would improve the maintainability.

@martin-augment
Copy link
Owner Author

  • spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala:42 – The filter signum($col) < 0 skips IEEE -0.0, so the negative-zero test never exercises that edge case. Maybe filter on signbit($col) (Spark 4.0 has it) or compare CAST(1/ $col AS ...) < 0 to ensure the dataset really contains -0.0 rows.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test.

@martin-augment
Copy link
Owner Author

  • .github/workflows/claude-code-review.yml / .github/workflows/claude.yml / .github/workflows/codex-action.yaml – Since these automation workflows disappear in the diff, just double-check that this removal is intentional; otherwise we lose the existing code-review bots.

value:annoying; category:bug; feedback:The Codex AI reviewer is wrong! Those files are not removed in this Pull Request and they would stay even if it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants