Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming left join with multiple null categorical values raises PanicException #14933

Closed
2 tasks done
daviskirk opened this issue Mar 8, 2024 · 0 comments · Fixed by #14934
Closed
2 tasks done

Streaming left join with multiple null categorical values raises PanicException #14933

daviskirk opened this issue Mar 8, 2024 · 0 comments · Fixed by #14934
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@daviskirk
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df1 = pl.LazyFrame({"_original_ix": pl.Series([0], dtype=pl.UInt32)})
df2 = pl.LazyFrame([
    pl.Series("_original_ix", [0, 1], dtype=pl.UInt32),
    pl.Series("l", [None, None], dtype=pl.Categorical(ordering='physical')),
])
df1.join(df2, on="_original_ix", how="left").collect(streaming=True) 
PanicException: assertion `left == right` failed: implementation error
  left: 1
 right: 2

Collecting without streaming works as expected.

shape: (1, 2)
┌──────────────┬──────┐
│ _original_ix ┆ l    │
│ ---          ┆ ---  │
│ u32          ┆ cat  │
╞══════════════╪══════╡
│ 0            ┆ null │
└──────────────┴──────┘
  • modifying df1 to pl.LazyFrame({"_original_ix": [0]}) results in PanicException: not implemented: Null instead.

Log output

thread 'thread 'thread 'thread 'thread 'thread '<unnamed><unnamed><unnamed>thread '<unnamed><unnamed>' panicked at ' panicked at <unnamed>' panicked at ' panicked at ' panicked at ' panicked at <unnamed>crates/polars-ops/src/chunked_array/gather/chunked.rscrates/polars-ops/src/chunked_array/gather/chunked.rs:crates/polars-ops/src/chunked_array/gather/chunked.rscrates/polars-ops/src/chunked_array/gather/chunked.rs' panicked at :84crates/polars-ops/src/chunked_array/gather/chunked.rs:84thread ':5<unnamed>crates/polars-ops/src/chunked_array/gather/chunked.rs:
' panicked at :crates/polars-ops/src/chunked_array/gather/chunked.rsassertion `left == right` failed: implementation error
  left: 1
 right: 2:84:
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
:84845::crates/polars-ops/src/chunked_array/gather/chunked.rs8455::84:5::
assertion `left == right` failed: implementation error
  left: 1
 right: 25:

:
assertion `left == right` failed: implementation error
  left: 1
 right: 2assertion `left == right` failed: implementation error
  left: 1
 right: 2:


assertion `left == right` failed: implementation error
  left: 1
 right: 2
:
assertion `left == right` failed: implementation error
  left: 1
 right: 2584:5:
assertion `left == right` failed: implementation error
  left: 1
 right: 2

:
assertion `left == right` failed: implementation error
  left: 1
 right: 2

python3.11/site-packages/polars/lazyframe/frame.py:1934, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
   1931 if background:
   1932     return InProcessQuery(ldf.collect_concurrently())
-> 1934 return wrap_df(ldf.collect())

PanicException: assertion `left == right` failed: implementation error
  left: 1
 right: 2

Issue description

Streaming left join with multiple null categorical values raises PanicException.

Expected behavior

The result should be the same as when not using streaming.

Installed versions

--------Version info---------
Polars:               0.20.14
Index type:           UInt32
Platform:             Linux-6.5.0-21-generic-x86_64-with-glibc2.38
Python:               3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.10.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.2.1
pyarrow:              15.0.0
pydantic:             1.10.13
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.25
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@daviskirk daviskirk added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 8, 2024
@c-peters c-peters added the accepted Ready for implementation label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants