Skip to content

Conversation

@martin-augment
Copy link
Owner

2775: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

This pull request introduces a new dynamic per-expression incompatibility configuration mechanism and updates the codebase to use it. A new method getExprAllowIncompatConfigKey(clazz: Class[_]) is added to CometConf to retrieve expression-specific configuration keys. Multiple test files are refactored to replace static COMET_EXPR_ALLOW_INCOMPATIBLE references with dynamic calls to this new method for specific expression classes (Cast, FromUnixTime, TruncDate, TruncTimestamp, and various array expressions). Some tests remove incompatibility configuration entirely. Additionally, the documentation default for spark.comet.logFallbackReasons.enabled is changed from false to true.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2775-2025-11-14-09-04-46

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

sql(s"create table $table(c9 int, c4 int) using parquet")
sql(s"insert into $table values(0, 0), (66, null), (null, 70), (null, null)")
val query = s"SELECT chr(c9), chr(c4) FROM $table"
checkSparkAnswerAndOperator(query)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Comet: Missing compatibility config breaks native test.

The COMET_EXPR_ALLOW_INCOMPATIBLE configuration was removed from the Chr with null character test, but the test still calls checkSparkAnswerAndOperator(), which verifies that the query is natively accelerated by Comet. If chr() remains incompatible with Spark, removing this config without corresponding code changes to mark the expression as compatible will cause the test to fail, as checkSparkAnswerAndOperator() expects native acceleration to succeed.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback:The Bugbot AI reviewer is not correct! I wasn't able to find any proof about this finding and all the tests pass without any issues.

@claude
Copy link

claude bot commented Nov 14, 2025

PR Review: Remove references to COMET_EXPR_ALLOW_INCOMPATIBLE

Summary

This PR refactors test suites to replace the deprecated global COMET_EXPR_ALLOW_INCOMPATIBLE configuration with more granular, expression-specific configuration keys. This is a positive change that improves maintainability and allows finer control over compatibility settings.

Code Quality and Best Practices

Strengths

  1. Better Configuration Granularity: The migration from the global COMET_EXPR_ALLOW_INCOMPATIBLE to expression-specific configs (e.g., CometConf.getExprAllowIncompatConfigKey(classOf[ArrayInsert])) is a significant improvement. This allows users to enable incompatible behavior for specific expressions rather than globally.

  2. Consistent Pattern: The changes follow a consistent pattern across all test files, making the codebase more uniform and easier to understand.

  3. Version-Aware Logic: The code in CometArrayExpressionSuite.scala:139-143 correctly handles Spark version differences by conditionally using ArrayInsert for Spark 4.0+ and ArrayAppend for earlier versions.

Areas for Improvement

  1. Documentation Update Needed: The documentation change in docs/source/user-guide/latest/configs.md changes the default value from false to true for spark.comet.logFallbackReasons.enabled. However:

    • The actual code in CometConf.scala:502 still shows createWithEnvVarOrDefault("ENABLE_COMET_LOG_FALLBACK_REASONS", false)
    • This creates a discrepancy between documentation and code
    • Recommendation: Either update the code to match the docs or correct the documentation
  2. Incomplete Migration: The global COMET_EXPR_ALLOW_INCOMPATIBLE config still exists in CometConf.scala:669 and is still referenced in production code at QueryPlanSerde.scala:669. This means:

    • The refactoring is incomplete from a deprecation perspective
    • Users could still rely on the global config
    • Recommendation: Consider adding deprecation warnings or updating the error messages to prioritize expression-specific configs

Potential Bugs or Issues

  1. Import Statement Added: The import of expression classes in CometArrayExpressionSuite.scala:26 is correctly added to support classOf[...] calls. This is good.

  2. Test Removal Without Replacement: In CometArrayExpressionSuite.scala:520-540, the test "array_compact" was modified to remove the withSQLConf(CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.key -> "true") wrapper. Since the test comment says "TODO fix for Spark 4.0.0" and the test is skipped for Spark 4.0+, this suggests:

    • The expression may now be compatible by default for earlier Spark versions
    • Recommendation: Verify that array_compact works correctly without the incompatibility flag enabled
  3. Fallback Test Enhancement: The change in CometArrayExpressionSuite.scala:236-238 adds proper fallback reason validation with checkSparkAnswerAndFallbackReasons. This is an improvement that makes tests more robust.

Performance Considerations

No performance concerns identified. The changes are in test code and configuration handling only.

Security Concerns

No security issues identified.

Test Coverage

Good Coverage

  1. Tests properly exercise both dictionary-enabled and dictionary-disabled paths
  2. The ArrayInsertUnsupportedArgs test validates fallback behavior with expected error messages
  3. Tests cover multiple expression types: array operations, casts, aggregates, strings, etc.

Potential Gaps

  1. No test validates the fallback to global config: When an expression-specific config is not set, the system should fall back to COMET_EXPR_ALLOW_INCOMPATIBLE. It would be good to have a test ensuring this fallback behavior still works.

  2. No negative tests for the new configs: Consider adding tests that verify expressions properly fall back to Spark when the expression-specific incompatibility flag is NOT set.

Commit Quality

The commit messages are quite terse:

  • "format", "save", "plan stability", "revert", "fix"

These don't provide much context about the changes. For a refactoring PR like this, more descriptive commit messages would be helpful.

Specific Recommendations

Critical

  1. Fix documentation/code mismatch for logFallbackReasons.enabled default value (see point 1 in Areas for Improvement)

Important

  1. Consider adding a deprecation notice to COMET_EXPR_ALLOW_INCOMPATIBLE in the config documentation
  2. Update the error message in QueryPlanSerde.scala:668-670 to recommend the expression-specific config first, with the global config as a fallback option

Nice to Have

  1. Add test coverage for global config fallback behavior
  2. Improve commit messages to be more descriptive
  3. Consider documenting the new expression-specific config pattern in the compatibility guide

Overall Assessment

This is a well-executed refactoring that improves the codebase structure. The changes are mostly mechanical and follow good patterns. The main issue is the documentation/code mismatch for the logFallbackReasons config, which should be resolved before merging.

Status: Approved with minor changes requested (fix the config documentation mismatch)

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Comment augment review to trigger a new review at any time.

| `spark.comet.explain.rules` | When this setting is enabled, Comet will log all plan transformations performed in physical optimizer rules. Default: false | false |
| `spark.comet.explainFallback.enabled` | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. Can be overridden by environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | false |
| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. Can be overridden by environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | true |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs set the default to true, but CometConf.COMET_LOG_FALLBACK_REASONS is defined with createWithEnvVarOrDefault(..., false), so the code default is false (unless the env var is set). Consider aligning the documentation with the actual default or updating the default in code.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback:The Augment AI reviewer is correct. The default value in the documentation has been changed but the actual value in the implementation is still the old one. Prevents inconsistency in the logic and its documentation

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
spark/src/test/scala/org/apache/comet/CometCastSuite.scala (1)

660-669: Replace global COMET_EXPR_ALLOW_INCOMPATIBLE with Cast-scoped key in partial-support tests

These three test methods still use the global configuration flag. Update them to use the Cast-specific key:

  test("cast StringType to FloatType (partial support)") {
    withSQLConf(
-     CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.key -> "true",
+     CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) -> "true",
       SQLConf.ANSI_ENABLED.key -> "false") {

Apply the same change at lines 678 and 695.

Also applies to: 676-685, 693-703

🧹 Nitpick comments (3)
docs/source/user-guide/latest/configs.md (1)

84-84: Grammar: Add a subject to the second sentence in the description.

The description text reads "Can be overridden by environment variable..." but lacks a subject pronoun. Consider revising to "This can be overridden..." or "It can be overridden..." for grammatical completeness.

-| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. Can be overridden by environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | true |
+| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. This can be overridden by environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | true |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (2)

188-193: Verify if these changes are intentional.

These lines are marked as changed but don't appear to relate to the PR objective of replacing COMET_EXPR_ALLOW_INCOMPATIBLE references. If these are formatting-only changes, consider separating them into a dedicated formatting commit for clearer history.


468-468: Minor incidental formatting changes noted.

These lines show formatting adjustments to withSQLConf blocks that don't relate to the PR's main objective. While not problematic, consider using a consistent auto-formatter across the codebase to minimize such incidental changes in focused PRs.

Also applies to: 1430-1430, 1981-1981, 2085-2085, 2781-2781

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7aa041 and 8e97fd1.

📒 Files selected for processing (8)
  • docs/source/user-guide/latest/configs.md (1 hunks)
  • spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala (16 hunks)
  • spark/src/test/scala/org/apache/comet/CometCastSuite.scala (5 hunks)
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (12 hunks)
  • spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala (1 hunks)
  • spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala (1 hunks)
  • spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala (2 hunks)
  • spark/src/test/scala/org/apache/comet/exec/CometWindowExecSuite.scala (0 hunks)
💤 Files with no reviewable changes (1)
  • spark/src/test/scala/org/apache/comet/exec/CometWindowExecSuite.scala
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 17
File: docs/source/contributor-guide/adding_a_new_operator.md:349-354
Timestamp: 2025-11-11T15:01:48.203Z
Learning: For Apache DataFusion Comet debugging documentation, the correct configuration keys are `spark.comet.explain.format=verbose` for verbose explain plans and `spark.comet.logFallbackReasons.enabled=true` for logging fallback reasons (not `spark.comet.explain.verbose` or `spark.comet.logFallbackReasons` without `.enabled`).
📚 Learning: 2025-11-11T15:01:48.203Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 17
File: docs/source/contributor-guide/adding_a_new_operator.md:349-354
Timestamp: 2025-11-11T15:01:48.203Z
Learning: For Apache DataFusion Comet debugging documentation, the correct configuration keys are `spark.comet.explain.format=verbose` for verbose explain plans and `spark.comet.logFallbackReasons.enabled=true` for logging fallback reasons (not `spark.comet.explain.verbose` or `spark.comet.logFallbackReasons` without `.enabled`).

Applied to files:

  • docs/source/user-guide/latest/configs.md
  • spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
📚 Learning: 2025-11-11T20:44:05.014Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 19
File: native/spark-expr/src/array_funcs/array_insert.rs:221-238
Timestamp: 2025-11-11T20:44:05.014Z
Learning: In Rust code using Apache Arrow arrays, always check `is_null(index)` before calling `value(index)` on `PrimitiveArray` types (such as `Int32Array`, `Int64Array`, etc.), because `value()` does not check for nulls and returns arbitrary values for null slots. This applies to functions in `native/spark-expr/src` that process Arrow arrays.

Applied to files:

  • spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala
🧬 Code graph analysis (5)
spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala (2)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (3)
  • checkSparkAnswerAndOperator (189-193)
  • checkSparkAnswerAndOperator (199-206)
  • checkSparkAnswerAndOperator (214-223)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)
  • test (48-55)
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala (1)
common/src/main/scala/org/apache/comet/CometConf.scala (1)
  • CometConf (46-767)
spark/src/test/scala/org/apache/comet/CometCastSuite.scala (1)
common/src/main/scala/org/apache/comet/CometConf.scala (3)
  • CometConf (46-767)
  • getExprAllowIncompatConfigKey (744-746)
  • getExprAllowIncompatConfigKey (748-750)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)
common/src/main/scala/org/apache/comet/CometConf.scala (3)
  • CometConf (46-767)
  • getExprAllowIncompatConfigKey (744-746)
  • getExprAllowIncompatConfigKey (748-750)
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala (3)
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala (8)
  • spark (429-429)
  • checkSparkAnswerAndFallbackReasons (275-279)
  • checkSparkAnswerAndFallbackReasons (282-301)
  • makeParquetFileAllPrimitiveTypes (583-585)
  • makeParquetFileAllPrimitiveTypes (650-739)
  • checkSparkAnswerAndOperator (189-193)
  • checkSparkAnswerAndOperator (199-206)
  • checkSparkAnswerAndOperator (214-223)
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala (1)
  • isSpark40Plus (228-230)
common/src/main/scala/org/apache/comet/CometConf.scala (3)
  • CometConf (46-767)
  • getExprAllowIncompatConfigKey (744-746)
  • getExprAllowIncompatConfigKey (748-750)
🪛 LanguageTool
docs/source/user-guide/latest/configs.md

[style] ~84-~84: To form a complete sentence, be sure to include a subject.
Context: ... log warnings for all fallback reasons. Can be overridden by environment variable `...

(MISSING_IT_THERE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review
🔇 Additional comments (31)
spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala (2)

175-181: Chr test now runs without global incompat flag — LGTM

Behavior remains explicit and self-contained.


186-191: Chr(null/0) compatibility coverage — LGTM

Covers Spark’s chr(0)/null semantics while asserting native execution.

spark/src/test/scala/org/apache/comet/CometCastSuite.scala (5)

436-439: Switch to per‑expression key for Float→Decimal — LGTM

Uses CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) as intended.


496-499: Switch to per‑expression key for Double→Decimal — LGTM


784-791: Timestamp cast (ignored test) toggles per‑expr key — LGTM


835-839: Timestamp subset test uses per‑expr key with timezone — LGTM


1268-1274: ANSI branch: per‑expr incompat key enabled — LGTM

Keeps exception parity checks intact.

spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala (2)

531-531: Cleanup of extraneous expr flag (shuffle mode retained) — LGTM


1440-1484: Typed-literals partition spec with timezone only — LGTM

Removing global incompat flag here reduces blast radius; test intent preserved.

spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala (1)

931-943: Decimal(38,2) overflow test no longer needs global flag — LGTM

Inputs exercise boundary/overflow; asserting two native aggregates matches expected plan shape.

spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala (16)

26-26: Add explicit array expression imports — LGTM

Clarifies per‑expr key mapping by class.


139-169: array_append: dynamic key (ArrayInsert vs ArrayAppend by Spark version) — LGTM


173-199: array_prepend uses ArrayInsert key (Spark 3.5+ mapping) — LGTM


202-222: ArrayInsert happy‑path coverage under per‑expr key — LGTM


227-241: ArrayInsert unsupported args: assert fallback reasons — LGTM

Note: reason strings can drift across versions; keep substring checks stable.


357-383: array_distinct under per‑expr key — LGTM


386-404: array_union under per‑expr key — LGTM


457-474: array_intersect under per‑expr key — LGTM


477-496: array_join under per‑expr key — LGTM


499-518: arrays_overlap under per‑expr key — LGTM


523-539: array_compact: broadened coverage; Spark 4 guard — LGTM


542-561: array_except basics under per‑expr key — LGTM


577-595: array_except all‑types path (native reader) uses per‑expr key — LGTM


615-616: Convert-from-Parquet path toggled — LGTM

Keeps conversion tests isolated from native scan.


634-659: array_repeat: per‑expr key + explain fallback enabled — LGTM


728-740: array literals: enable explain fallback for clearer diagnostics — LGTM

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (5)

31-31: LGTM! Import addition supports per-expression configuration.

The FromUnixTime import is correctly added to support the dynamic configuration key usage at line 1749.


618-618: LGTM! Cast configuration correctly uses per-expression keys.

The replacement of the global COMET_EXPR_ALLOW_INCOMPATIBLE.key with CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) correctly narrows the incompatibility configuration to Cast-specific operations in these timestamp casting tests.

Also applies to: 640-640, 662-662


704-704: LGTM! TruncDate configuration correctly uses per-expression key.

The use of classOf[TruncDate] appropriately targets the date truncation functionality being tested.


752-752: LGTM! Timestamp truncation tests correctly configure relevant expressions.

Line 752 appropriately enables Cast for timestamp_ntz operations, while lines 794-795 correctly configure both Cast and TruncTimestamp for a test that uses both casting and timestamp truncation operations.

Also applies to: 794-795


1749-1749: LGTM! FromUnixTime configuration correctly uses per-expression key.

The use of classOf[FromUnixTime] appropriately targets the unix timestamp conversion functionality being tested, consistent with the import addition at line 31.

@martin-augment
Copy link
Owner Author

Documentation Update Needed: The documentation change in docs/source/user-guide/latest/configs.md changes the default value from false to true for spark.comet.logFallbackReasons.enabled. However:

  • The actual code in CometConf.scala:502 still shows createWithEnvVarOrDefault("ENABLE_COMET_LOG_FALLBACK_REASONS", false)
  • This creates a discrepancy between documentation and code
  • Recommendation: Either update the code to match the docs or correct the documentation

value:useful; category:documentation; feedback:The Claude AI reviewer is correct. The default value in the documentation has been changed but the actual value in the implementation is still the old one. Prevents inconsistency in the logic and its documentation

@martin-augment
Copy link
Owner Author

martin-augment commented Nov 14, 2025

Grammar: Add a subject to the second sentence in the description.

The description text reads "Can be overridden by environment variable..." but lacks a subject pronoun. Consider revising to "This can be overridden..." or "It can be overridden..." for grammatical completeness.

value:good-to-have; category:documentation; feedback:The CodeRabbit AI reviewer is correct that the grammar in the documentation could be improved for this config key.

@martin-augment
Copy link
Owner Author

martin-augment commented Nov 14, 2025

Replace global COMET_EXPR_ALLOW_INCOMPATIBLE with Cast-scoped key in partial-support tests

These three test methods still use the global configuration flag. Update them to use the Cast-specific key:

  test("cast StringType to FloatType (partial support)") {
    withSQLConf(
-     CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.key -> "true",
+     CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) -> "true",
       SQLConf.ANSI_ENABLED.key -> "false") {

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct that the config key could be narrowed to a more specific one that it responsible only for the casting related operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants