-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
A-optimizerArea: plan optimizationArea: plan optimizationP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
left = pl.LazyFrame({"a": pl.Series([1, 2, 3], dtype=pl.Int16)})
right = pl.LazyFrame({"a": pl.Series([11, 12], dtype=pl.Int64)})
q_one = left.join(right, left_on=pl.col("a") * 2, right_on=pl.col("a"))
q_two = left.join_where(right, pl.col("a") * 2 == pl.col("a_right"))
q_one.collect() # empty frame
q_two.collect() # ComputeError: datatypes of join keys don't match - `a`: i32 on left does not match `a`: i64 on right
Log output
join parallel: true
INNER join dataframes finished
join parallel: true
INNER join dataframes finished
Issue description
If one writes join using the join_where
syntax, then type coercion and normalisation does not appear to be run on the join keys, resulting in subtly different behaviour compared to normal (non-conditional) joins.
The above example shows two different ways of writing the same thing, but the latter does not succeed.
Noticed this because in the GPU engine when implementing conditional joins we need to know the concrete dtype of any expressions and, particularly, we find that Literals are not given a dtype.
Expected behavior
I expected these two queries to produce the same results, and join_where
to have type coercion run on the join keys.
Installed versions
--------Version info---------
Polars: 1.21.0
Index type: UInt32
Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.35
Python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:24:40) [GCC 13.3.0]
LTS CPU: False
----Optional dependencies----
Azure CLI <not installed>
adbc_driver_manager <not installed>
altair <not installed>
azure.identity <not installed>
boto3 1.36.1
cloudpickle 3.1.1
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec 2024.12.0
gevent <not installed>
google.auth <not installed>
great_tables <not installed>
matplotlib <not installed>
numpy 2.0.2
openpyxl 3.1.5
pandas 2.2.3
pyarrow 19.0.0
pydantic <not installed>
pyiceberg <not installed>
sqlalchemy 2.0.37
torch 2.5.1.post303
xlsx2csv <not installed>
xlsxwriter <not installed>
Metadata
Metadata
Assignees
Labels
A-optimizerArea: plan optimizationArea: plan optimizationP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars
Type
Projects
Status
Done