Skip to content

Regression: Invalid comparison operation: Utf8 == Utf8View error during LEFT ANTI JOIN #13510

Closed
@sergiimk

Description

@sergiimk

Describe the bug

Between 42.2.0 and 43.0.0 there appears to have been a regression that introduced an error:

External(ArrowError(InvalidArgumentError("Invalid comparison operation: Utf8 == Utf8View"), None))

Note that the error happens at the plan execution phase, i.e. plan validation passes successfully.

To Reproduce

Minimal repro is:

  • Read data from CSV (Utf8 columns)
  • Read data from Parquet (Utf8View)
  • Do a LEFT ANTI JOIN

Including a test project with sample data: datafusion-13510.zip

Physical plan:

CoalesceBatchesExec: target_batch_size=8192
  HashJoinExec: mode=Partitioned, join_type=LeftAnti, on=[(date@0, date@0), (city@1, city@1)]
    CoalesceBatchesExec: target_batch_size=8192
      RepartitionExec: partitioning=Hash([date@0, city@1], 16), input_partitions=16
        RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1
          CsvExec: file_groups={1 group: [[home/.../datafusion-13510/data/data2.csv]]}, projection=[date, city, population], has_header=true
    CoalesceBatchesExec: target_batch_size=8192
      RepartitionExec: partitioning=Hash([date@0, city@1], 16), input_partitions=1
        ParquetExec: file_groups={1 group: [[home/.../datafusion-13510/data/data1.parquet]]}, projection=[date, city]

Expected behavior

No error / error during planning if some operation is invalid

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedExtra attention is neededregressionSomething that used to work no longer does

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions