-
Notifications
You must be signed in to change notification settings - Fork 0
2689: feat: Add support for abs
#8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR adds comprehensive support for the Changes
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Review: Add support for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 2 suggestions posted.
| f.dataType == DataTypes.FloatType || f.dataType == DataTypes.DoubleType)) { | ||
| val col = field.name | ||
| checkSparkAnswerAndOperator( | ||
| s"SELECT $col, abs($col) FROM tbl WHERE signum($col) < 0 ORDER BY $col") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter signum($col) < 0 will not include negative zero (−0.0), so this test may miss verifying abs behavior for negative zero. Consider adjusting the predicate to actually select negative zeros so the intended case is exercised.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test.
| Ok(()) | ||
| } | ||
| Err(e) => { | ||
| if fail_on_error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unsigned types, abs cannot overflow; accepting an error when fail_on_error is true can mask regressions. It would be safer for this test to assert success in both modes for UInt8.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (1)
45-49: Fix the negative-zero predicate
signum(-0.0)evaluates to-0.0, and the comparison-0.0 < 0is false, so this filter excludes every negative-zero row. As a result the “negative zero” test never exercises the intended inputs. Please change the predicate to one that positively identifies negative zero (for example, constrain to zero and check the reciprocal’s sign, such as$col = 0 AND 1 / $col < 0), or otherwise inject deterministic-0.0rows that you validate explicitly.A concrete option:
checkSparkAnswerAndOperator( s"SELECT $col, abs($col) FROM tbl " + s"WHERE $col = 0 AND 1 / $col < 0 ORDER BY $col")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.github/workflows/pr_build_linux.yml(1 hunks).github/workflows/pr_build_macos.yml(1 hunks)docs/source/user-guide/latest/configs.md(1 hunks)docs/source/user-guide/latest/expressions.md(1 hunks)fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala(3 hunks)native/core/src/execution/planner.rs(0 hunks)native/proto/src/proto/expr.proto(0 hunks)native/spark-expr/src/comet_scalar_funcs.rs(2 hunks)native/spark-expr/src/math_funcs/abs.rs(1 hunks)native/spark-expr/src/math_funcs/mod.rs(1 hunks)spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala(1 hunks)spark/src/main/scala/org/apache/comet/serde/math.scala(2 hunks)spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala(0 hunks)spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala(1 hunks)
💤 Files with no reviewable changes (3)
- native/core/src/execution/planner.rs
- native/proto/src/proto/expr.proto
- spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
📚 Learning: 2025-11-04T14:26:48.750Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
Applied to files:
native/spark-expr/src/math_funcs/abs.rsspark/src/main/scala/org/apache/comet/serde/math.scalaspark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scalanative/spark-expr/src/math_funcs/mod.rsnative/spark-expr/src/comet_scalar_funcs.rs
🧬 Code graph analysis (7)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
spark/src/main/scala/org/apache/comet/serde/math.scala (2)
CometUnhex(128-145)CometAbs(147-175)
native/spark-expr/src/math_funcs/abs.rs (1)
native/spark-expr/src/lib.rs (1)
arithmetic_overflow_error(110-114)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/QueryRunner.scala (2)
QueryRunner(31-145)runQueries(42-144)
spark/src/main/scala/org/apache/comet/serde/math.scala (1)
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (8)
getSupportLevel(1783-1783)Compatible(1714-1714)Unsupported(1725-1725)convert(661-699)convert(1848-1852)exprToProtoInternal(656-809)scalarFunctionExprToProtoWithReturnType(882-894)optExprWithInfo(1567-1578)
spark/src/test/scala/org/apache/comet/CometMathExpressionSuite.scala (2)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (6)
spark(422-422)CometTestBase(55-1268)checkSparkAnswerAndOperator(189-193)checkSparkAnswerAndOperator(199-206)checkSparkAnswerAndOperator(214-223)checkSparkMaybeThrows(302-318)spark/src/main/scala/org/apache/comet/testing/FuzzDataGenerator.scala (3)
DataGenOptions(257-262)FuzzDataGenerator(35-231)generateDataFrame(88-105)
native/spark-expr/src/math_funcs/mod.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
abs(67-310)
native/spark-expr/src/comet_scalar_funcs.rs (1)
native/spark-expr/src/math_funcs/abs.rs (1)
abs(67-310)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: codex
🔇 Additional comments (11)
fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala (1)
50-50: LGTM!Making
numFilesrequired is appropriate since it's always used without default handling (line 109). This improves API clarity..github/workflows/pr_build_macos.yml (1)
102-102: LGTM! Test suite properly added to CI.The addition of
CometMathExpressionSuiteto the exec test suite ensures the new Abs functionality will be tested on macOS builds.docs/source/user-guide/latest/expressions.md (1)
121-121: LGTM! Documentation entry is complete and consistent.The Abs entry correctly documents it as a Spark-compatible expression with the
absSQL function, matching the format of other math expression entries.docs/source/user-guide/latest/configs.md (1)
167-167: LGTM! Configuration entry is properly documented.The
spark.comet.expression.Abs.enabledconfiguration follows the standard pattern and is correctly positioned alphabetically in the scalar expressions table.native/spark-expr/src/math_funcs/mod.rs (1)
18-18: LGTM! Module declaration follows established patterns.The
absmodule is correctly declared withpub(crate)visibility, consistent with other math function modules likehexandunhex.spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (1)
140-141: LGTM! Expression mapping is correctly configured.The mapping from
classOf[Abs]toCometAbsfollows the established pattern and is appropriately positioned in themathExpressionsmap.native/spark-expr/src/comet_scalar_funcs.rs (2)
19-19: LGTM! Import correctly references the abs function.The import brings in the
absfunction from the new math_funcs::abs module for registration in the UDF system.
184-187: LGTM! Function registration follows the correct pattern.The abs function is properly registered using the
without data_typemacro pattern, consistent with other dynamic-type functions likeunhex. This allows the function to handle multiple numeric types at runtime..github/workflows/pr_build_linux.yml (1)
137-137: LGTM! Test coverage added to Linux CI.The
CometMathExpressionSuiteis correctly added to the exec test suite, ensuring Abs functionality is tested across all CI platforms.spark/src/main/scala/org/apache/comet/serde/math.scala (2)
22-23: LGTM! Imports are appropriate and complete.The updated imports correctly add
Absfor the new expression type andNumericTypefor type checking ingetSupportLevel.
147-175: LGTM! CometAbs implementation is well-structured and consistent.The implementation follows established patterns perfectly:
- Type safety:
getSupportLevelproperly validates that onlyNumericTypeis supported, providing a clear error message for unsupported interval types- Serialization pattern: The
convertmethod mirrorsCometUnhexexactly, usingfailOnErrorExpras a function argument while setting the builder'sfailOnErrortofalse- Error tracking: Uses
optExprWithInfoto ensure proper fallback loggingThis maintains consistency with the codebase's approach to ANSI-mode-aware math operations.
Based on learnings
| opt[Int](required = true, descr = "Number of input files to use") | ||
| val showFailedSparkQueries: ScallopOption[Boolean] = | ||
| opt[Boolean](required = false, descr = "Whether to show failed Spark queries") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Optional parameter will cause runtime exception if not provided.
Making numFiles required is good (line 61). However, since showFailedSparkQueries is marked required = false, calling it with () at line 116 will throw an exception when the user doesn't provide the flag.
Apply this diff to handle the optional parameter correctly:
case Some(conf.runQueries) =>
QueryRunner.runQueries(
spark,
conf.runQueries.numFiles(),
conf.runQueries.filename(),
- conf.runQueries.showFailedSparkQueries())
+ conf.runQueries.showFailedSparkQueries.getOrElse(false))Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala around lines
61-63 and usage at line 116, the optional ScallopOption showFailedSparkQueries
is defined with required = false but later invoked like a required option which
throws at runtime if the flag isn't provided; change the usage to read the
option safely (e.g., use getOrElse/ toOption with a default false) so the code
supplies a default boolean when the flag is absent, ensuring no exception is
thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:annoying; category:bug; feedback:This file is not part of the Pull Request. It should not be reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the file has changes in the current PR branch compared to main
git diff main...HEAD -- fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala
Unrecoverable error: Failed to clone repository
@martin-augment, I apologize for reviewing this file! You're right that these changes to fuzz-testing/src/main/scala/org/apache/comet/fuzz/Main.scala appear unrelated to adding abs support and shouldn't be part of this PR.
To clean this up, you can:
- Revert the changes to this file in your PR branch
- If you need these
showFailedSparkQuerieschanges, consider creating a separate PR for them
I'll skip reviewing this file going forward. Thank you for the clarification!
🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
| fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) { | ||
| for fail_on_error in [true, false] { | ||
| let _ = test_fn(fail_on_error); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the helper propagates failures
with_fail_on_error currently drops the Result from each invocation, so any Err produced by the closure (for example via ? when an assertion fails) is silently ignored and the test still passes. Please make the helper surface the failure—either by returning a Result<()> and propagating with ?, or by explicitly unwrapping/expecting the outcome—so these tests fail when the inner assertions fail.
A minimal fix could look like:
-fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) {
- for fail_on_error in [true, false] {
- let _ = test_fn(fail_on_error);
- }
-}
+fn with_fail_on_error<F: Fn(bool) -> Result<()>>(test_fn: F) -> Result<()> {
+ for fail_on_error in [true, false] {
+ test_fn(fail_on_error)?;
+ }
+ Ok(())
+}(then update the call sites to handle the returned Result).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In native/spark-expr/src/math_funcs/abs.rs around lines 320 to 324,
with_fail_on_error currently discards the Result from each closure call so any
Err is ignored; change the helper to return Result<()> and propagate errors
(call test_fn(fail_on_error)? inside the loop and return Ok(()) at the end) or
alternatively assert by calling .unwrap()/.expect() on each call, and update all
call sites to handle the returned Result (use ? in test functions or
unwrap/expect) so failing assertions surface correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that the Result is ignored and the test always passes. The result of test_fn() execution should be checked for Err to fix this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment, thank you for confirming! You're absolutely right that the Result being ignored means these tests can silently pass even when assertions fail, which defeats the purpose of testing both the fail_on_error = true and fail_on_error = false code paths.
Would you like me to generate the code changes to fix this issue? I can update the with_fail_on_error helper to properly propagate errors and modify all the test functions that use it to handle the returned Result. Just let me know!
🧠 Learnings used
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
|
Findings:
Suggestions:
|
PR Review: Add support for
|
| _ => unreachable!(), | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unsigned absolute value never overflows for unsigned types
The test test_abs_u8_scalar incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.
| _ => unreachable!(), | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unsigned abs never errors; test is wrong
The test test_abs_u64_array incorrectly expects that unsigned types might throw an ARITHMETIC_OVERFLOW error. However, the implementation for unsigned types (lines 88-92) simply returns the input unchanged without any error checking, since unsigned integers cannot overflow when taking absolute value. The test should not check for errors at all for unsigned types, as they will always succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback:The AI reviewer is correct that abs(unsigned) should not fail, so it is better to always panic the test if it fails for any reason. The test would be more correct this way.
value:useful; category:bug; feedback:The Codex AI reviewer is correct that there is no check that the abs function is called with at least one argument. Prevents a runtime error when trying to read the first argument. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that there is code duplication that could be reduced with declarative macros. There is no bug/regression here. It would improve the maintainability. |
value:useful; category:bug; feedback:The Codex AI reviewer is correct that signum(-0.0) will return 0 and thus the WHERE clause won't include the negative zero. The test name mentions "negative zero", so it looks like there is a bug in the test. |
value:annoying; category:bug; feedback:The Codex AI reviewer is wrong! Those files are not removed in this Pull Request and they would stay even if it is merged. |
2689: To review by AI
Note
Implements Spark-compatible
absacross JVM serde and native execution with ANSI/legacy behavior, updates docs/configs, and adds a dedicated test suite and CI entries.CometAbsand map SparkAbs-> native scalar func, carryingfailOnErrorfor ANSI vs legacy.AbsinQueryPlanSerdemath expressions.absUDF (native/spark-expr/src/math_funcs/abs.rs) supporting ints/floats/decimals with overflow handling (ANSI) and wrapping behavior (legacy).create_comet_physical_funand modulemath_funcs::abs.Absmessage; delete commented planner code).Absto supported math expressions and enable/disable config table.CometMathExpressionSuitecoveringabs(including negative zero, ANSI mode); enable suite in Linux/macOS workflows.Written by Cursor Bugbot for commit 1149438. This will update automatically on new commits. Configure here.