Describe the bug
Spark 4.0 widens many string-typed inputTypes on datetime expressions to StringTypeWithCollation(supportsTrimCollation = true). The affected datetime expressions include convert_timezone, date_format, date_trunc, from_unixtime, make_timestamp, next_day, to_unix_timestamp, trunc, and unix_timestamp.
Today the Comet serdes for these expressions accept those string inputs without distinguishing the collation, so non-default collations are silently treated as compatible. Per the audit-comet-expression skill (rule 11), a non-default collation on a string input should flip the support level to Incompatible(Some(...)) so the divergence is visible in EXPLAIN and the auto-generated compatibility guide, and so the projection falls back rather than producing potentially divergent results.
Steps to reproduce
On Spark 4.0, apply a non-default collation (for example UTF8_LCASE or UNICODE_CI) to a string argument of one of the datetime expressions above and observe that Comet still runs the expression natively without distinguishing the collation.
Expected behavior
Non-default collations on string inputs to these datetime expressions should report Incompatible(Some(...)) (falling back unless explicitly opted in), consistent with how other expressions gate collation.
Additional context
Split out from the high-priority list in #4502 (item 5, originally tracked as medium priority) so that #4502 can be closed once the remaining fixes land. Cross-references #2190 and #4496.
Describe the bug
Spark 4.0 widens many string-typed
inputTypeson datetime expressions toStringTypeWithCollation(supportsTrimCollation = true). The affected datetime expressions includeconvert_timezone,date_format,date_trunc,from_unixtime,make_timestamp,next_day,to_unix_timestamp,trunc, andunix_timestamp.Today the Comet serdes for these expressions accept those string inputs without distinguishing the collation, so non-default collations are silently treated as compatible. Per the
audit-comet-expressionskill (rule 11), a non-default collation on a string input should flip the support level toIncompatible(Some(...))so the divergence is visible in EXPLAIN and the auto-generated compatibility guide, and so the projection falls back rather than producing potentially divergent results.Steps to reproduce
On Spark 4.0, apply a non-default collation (for example
UTF8_LCASEorUNICODE_CI) to a string argument of one of the datetime expressions above and observe that Comet still runs the expression natively without distinguishing the collation.Expected behavior
Non-default collations on string inputs to these datetime expressions should report
Incompatible(Some(...))(falling back unless explicitly opted in), consistent with how other expressions gate collation.Additional context
Split out from the high-priority list in #4502 (item 5, originally tracked as medium priority) so that #4502 can be closed once the remaining fixes land. Cross-references #2190 and #4496.