Skip to content

Conversation

ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Oct 10, 2025

Description

  1. Type Conflict Handling
  • Changed behavior: Type conflicts now throw IllegalArgumentException instead of auto-renaming fields
  • Modified: SchemaUnifier.java - Removed automatic field renaming logic (e.g., age → age0)
  • Updated Tests:
    • CalciteMultisearchCommandIT.testMultisearchWithDirectTypeConflict - Now expects exception
    • CalcitePPLAppendCommandIT.testAppendWithConflictTypeColumn - Now expects exception
  • Documentation: Updated multisearch.rst and append.rst to reflect new behavior and add Limitations section
  1. Timestamp Interleaving
  • Modified: CalciteRelNodeVisitor.findTimestampField() - Now only detects @timestamp field for timestamp interleaving
  • Other timestamp fields (_time, timestamp, time) are no longer used for interleaving
  1. Documentation Fixes
  • multisearch.rst:
    • Added Limitations section
    • Removed redundant Example 4 (Handling Empty Results)
    • Removed Example 6 (type-conflict resolution)
  • append.rst:
    • Added Limitations section
    • Removed Example 3 (type conflict example no longer valid)

Related PRs

#4332 #4123

@ahkcs ahkcs changed the title Fixes for Multisearch and Append command Fixes for Multisearch and Append command Oct 10, 2025
ahkcs added 4 commits October 10, 2025 14:12
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Comment on lines 92 to 97
if (!typesForName.contains(fieldType)) {
// New field or same name with different type - add to schema
schema.add(new SchemaField(fieldName, fieldType));
typesForName.add(fieldType);
}
// If we've seen this exact (name, type) combination, skip it
Copy link
Collaborator

@ykmr1224 ykmr1224 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather raise error from here. (Then we don't need to check again)
Later we would implement a logic to generalize types for the same field name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I think it makes sense, moved the location to raise error

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested a review from ykmr1224 October 10, 2025 22:19

if (!typesForName.contains(fieldType)) {
// New field or same name with different type - add to schema
RelDataType existingType = seenFields.get(fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently find another issue of type conflicts here. RelDataType evaluates the hash equality by its digested string as well. For example, "INTEGER" is not equal to "INTEGER NOT NULL". A quick fix would be aligning the same SqlType to be nullable. Ideally it won't affect the data type resolution while execution. cc @xinyual

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider to allow same SqlTypeName but with different nullability to be merged here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have updated the implementation to allow same SqlTypeName but with different nullability to be merged

Limitations
===========

* **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand the intention here. Strong schema engine like SQL restricts the type to be the same. Some weak schema engine resolves types at runtime and doesn't care the data type. I think it's not easy to make it compatible.

Not sure what's better user experience and customer expectation here. Does user accept this behavior or expect to union anyway? cc @LantaoJin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to enable permissive mode in the future: #4349 to support schema merging with type conflicts, in order to avoid breaking changes in the future, we are marking this as a limitation now instead of using a workaround. cc @penghuo

ykmr1224
ykmr1224 previously approved these changes Oct 15, 2025
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants