Skip to content

Conversation

@martin-augment
Copy link
Owner

2760: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Walkthrough

Adds documentation section for Complex Type Casting under Unsupported Casts, covering support rules for casting complex types including arrays, structs, and maps with nested structures, recursive validation semantics, and example CAST expressions.

Changes

Cohort / File(s) Change Summary
Documentation Updates
docs/source/user-guide/latest/compatibility.md
Adds Complex Type Casting section documenting support rules for casting complex types, nested structures, recursive validation semantics, and special cases; notes map casts are unsupported and fall back to Spark
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2760-2025-11-11-13-45-26

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e90cce and 80b7298.

📒 Files selected for processing (1)
  • docs/source/user-guide/latest/compatibility.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review
🔇 Additional comments (1)
docs/source/user-guide/latest/compatibility.md (1)

195-247: Documentation of complex type casting appears premature given current Comet development status.

Based on the project's current stage, complex type support (maps, structs, and arrays) in Comet is still under development and does not yet support reading complex types. While the new documentation section is well-structured and clearly explains the intended casting behavior, it may document planned features rather than fully implemented functionality.

Verify the following before merging:

  1. Implementation completeness: Confirm that complex type casting for arrays and structs is actually implemented in the codebase (not just planned). The section describes detailed recursive validation semantics that require corresponding implementation.

  2. Null value handling: The documentation notes empty arrays with NullType elements can be cast to any array type, but doesn't address how nulls within array or struct elements are handled during casting operations.

  3. Behavior gaps: Verify error handling for edge cases (e.g., struct field count mismatches, incompatible nested type combinations) against the actual implementation.

If this documentation describes features not yet fully implemented, consider adding a note indicating these are planned features or moving the section to a roadmap section.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Nov 11, 2025

PR Review: Documentation for Supported Casts with Complex Types

Thank you for adding this documentation! This is a valuable addition to the compatibility guide. Here's my detailed review:


Strengths

  1. Clear Structure: The documentation is well-organized with distinct sections for each complex type (Arrays, Structs, Maps)
  2. Good Examples: The SQL examples help users understand the casting behavior concretely
  3. Recursive Validation Explanation: The documentation clearly explains how nested types are handled recursively
  4. Compatibility Level Propagation: The note about "worst" compatibility level inheritance is important and well-explained

🔍 Issues & Recommendations

1. Missing Struct Field Count Validation in Documentation

Issue: The documentation states that struct-to-struct casts require "the same number of fields," but I don't see this validation in the implementation at CometCast.scala:157-166.

case (from_struct: StructType, to_struct: StructType) =>
  from_struct.fields.zip(to_struct.fields).foreach { case (a, b) =>
    isSupported(a.dataType, b.dataType, timeZoneId, evalMode) match {
      case Compatible(_) =>
      // all good
      case other =>
        return other
    }
  }
  Compatible()

Problem: The .zip() operation will silently truncate to the shorter length if field counts differ. For example:

  • struct(1, 'a', true) zipped with struct<x: int, y: string> would only check the first 2 fields

Recommendation: Either:

  1. Add explicit field count validation to the implementation:
    if (from_struct.fields.length != to_struct.fields.length) {
      return Unsupported(Some("Struct field count mismatch"))
    }
  2. Or update the documentation to reflect the actual behavior (which may differ from Spark)

2. Missing Array-to-Array Cast Documentation

Issue: The documentation mentions "Array to Array" casting (docs/source/user-guide/latest/compatibility.md:205-209), but:

  • There are no tests for array-to-array casts in CometCastSuite.scala
  • Only ArrayType to StringType is tested (CometCastSuite.scala:1050)

Recommendation: Add test coverage for array-to-array casts:

test("cast ArrayType to ArrayType") {
  // Test: CAST(array(1, 2, 3) AS ARRAY<BIGINT>)
  // Test: CAST(array(1.0, 2.0) AS ARRAY<STRING>)
  // Test nested arrays: CAST(array(array(1, 2)) AS ARRAY<ARRAY<BIGINT>>)
}

3. Technical Accuracy: Struct Cast Compatibility

Current Implementation (CometCast.scala:234-244):

  • Struct-to-String: Returns Incompatible if any field returns Incompatible
  • Struct-to-Struct: Returns the first non-Compatible result ✅

Documentation Claim:

"If any field cast is Incompatible or Unsupported, the entire struct cast inherits that level"

Verdict: ✅ Accurate - The implementation matches the documentation.


4. Documentation Formatting & Clarity

Suggested Improvements:

  1. Add section anchor: For easier linking

    ### Complex Type Casting {#complex-type-casting}
  2. Clarify "binary array" limitation (compatibility.md:119-120):

    - **Array to String**: Arrays can be cast to string, except for arrays with binary elements.
      - Arrays containing `BinaryType` elements → Incompatible
  3. Add cross-reference: Link to the cast tracking issue for completeness

    See also the [cast implementation tracking issue](https://github.com/apache/datafusion-comet/issues/286).

🧪 Test Coverage Assessment

Current State:

  • ✅ Struct-to-String: Well tested (CometCastSuite.scala:959-985)
  • ✅ Struct-to-Struct: Tested (CometCastSuite.scala:987-1017)
  • ✅ Array-to-String: Tested (CometCastSuite.scala:1050-1073)
  • Array-to-Array: NOT tested (should be added)
  • ✅ Map types: Documented as unsupported (no false promises)

Recommendation: Before merging this documentation, add basic array-to-array cast tests to ensure the documented behavior is actually implemented and correct.


🔒 Security & Performance

  • Security: ✅ No security concerns - this is documentation only
  • Performance: ✅ No performance impact - this is documentation only
  • Compatibility: ⚠️ Minor concern about struct field count mismatch (see Issue Add Claude Code GitHub Workflow #1 above)

📋 Action Items

Before merging:

  1. ⚠️ HIGH: Verify struct field count validation behavior and update docs or code accordingly
  2. ⚠️ MEDIUM: Add tests for array-to-array casts to match documentation
  3. 💡 LOW: Consider the formatting improvements suggested above

Nice to have:

  • Add a code example showing the "worst compatibility" propagation behavior
  • Include a link to the implementation file for developers

🎯 Overall Assessment

This is a valuable documentation addition that improves user understanding of complex type casting. The main concerns are:

  1. Potential inconsistency in struct field count handling
  2. Missing test coverage for array-to-array casts

Once these are addressed, this PR will be ready to merge.

Rating: 7/10 (would be 9/10 with the suggested fixes)


Review conducted by Claude Code - generated based on code analysis and best practices

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.


- **Array to Array**: Arrays can be cast to other array types if the element types are compatible.
- Recursively validates that the source element type can be cast to the target element type
- Example: `CAST(array(1, 2, 3) AS ARRAY<BIGINT>)``array(1L, 2L, 3L)`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“NullType elements (empty arrays)” is misleading: Comet treats any Array (including arrays of nulls, not just empty arrays) as castable to any array type; consider clarifying the parenthetical.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback:The Augment AI reviewer is correct that "empty arrays" here is not correct. The array is not empty but it is full with NullType elements only

@martin-augment
Copy link
Owner Author

Missing Struct Field Count Validation in Documentation

Issue: The documentation states that struct-to-struct casts require "the same number of fields," but I don't see this validation in the implementation at CometCast.scala:157-166.

case (from_struct: StructType, to_struct: StructType) =>
  from_struct.fields.zip(to_struct.fields).foreach { case (a, b) =>
    isSupported(a.dataType, b.dataType, timeZoneId, evalMode) match {
      case Compatible(_) =>
      // all good
      case other =>
        return other
    }
  }
  Compatible()

Problem: The .zip() operation will silently truncate to the shorter length if field counts differ. For example:

  • struct(1, 'a', true) zipped with struct<x: int, y: string> would only check the first 2 fields

value:useful; category:documentation; feedback:The Claude AI reviewer is correct that the implementation does not match the documentation. The implementation is not that strict about the same length of the struct fields. The finding prevents confusion in the reader

@martin-augment
Copy link
Owner Author

2. Missing Array-to-Array Cast Documentation

Issue: The documentation mentions "Array to Array" casting (docs/source/user-guide/latest/compatibility.md:205-209), but:

  • There are no tests for array-to-array casts in CometCastSuite.scala
  • Only ArrayType to StringType is tested (CometCastSuite.scala:1050)

value:useful; category:documentation; feedback:The Claude AI reviewer is correct that there are no tests validating the claim that the cast from ArrayType to ArrayType is actually supported. https://github.com/apache/datafusion-comet/blob/e77998ad2228b758d0a93588c57325e56bda83bc/spark/src/main/scala/org/apache/comet/expressions/CometCast.scala#L118-L124 shows that it is supposed to work but having tests will make sure there are no regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants