Skip to content

join_where query normalisation doesn't run type-coercion pass #20935

@wence-

Description

@wence-

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

left = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Int16)})
right = pl.LazyFrame({"a": pl.Series([11, 12], dtype=pl.Int64)})

q_one = left.join(right, left_on=pl.col("a") * 2, right_on=pl.col("a"))

q_two = left.join_where(right, pl.col("a") * 2 == pl.col("a_right"))

q_one.collect() # empty frame
q_two.collect() # ComputeError: datatypes of join keys don't match - `a`: i32 on left does not match `a`: i64 on right

Log output

join parallel: true
INNER join dataframes finished
join parallel: true
INNER join dataframes finished

Issue description

If one writes join using the join_where syntax, then type coercion and normalisation does not appear to be run on the join keys, resulting in subtly different behaviour compared to normal (non-conditional) joins.

The above example shows two different ways of writing the same thing, but the latter does not succeed.

Noticed this because in the GPU engine when implementing conditional joins we need to know the concrete dtype of any expressions and, particularly, we find that Literals are not given a dtype.

Expected behavior

I expected these two queries to produce the same results, and join_where to have type coercion run on the join keys.

Installed versions

--------Version info---------
Polars:              1.21.0
Index type:          UInt32
Platform:            Linux-6.8.0-51-generic-x86_64-with-glibc2.35
Python:              3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.36.1
cloudpickle          3.1.1
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.12.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.0.2
openpyxl             3.1.5
pandas               2.2.3
pyarrow              19.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           2.0.37
torch                2.5.1.post303
xlsx2csv             <not installed>
xlsxwriter           <not installed>

Metadata

Metadata

Labels

A-optimizerArea: plan optimizationP-mediumPriority: mediumacceptedReady for implementationbugSomething isn't workingpythonRelated to Python Polars

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions