Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyFrame::cross_join + concat_list error #18587

Open
2 tasks done
kgv opened this issue Sep 6, 2024 · 2 comments
Open
2 tasks done

LazyFrame::cross_join + concat_list error #18587

kgv opened this issue Sep 6, 2024 · 2 comments
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@kgv
Copy link
Contributor

kgv commented Sep 6, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

// manual cartesian product (OK)
{
    let mut lazy_frame = df! {
        "1" => df! {
            "u32" => &[0u32, 0, 0, 0, 1, 1, 1, 1],
            "str" => &["a", "a", "a", "a", "b", "b", "b", "b"],
        }
        .unwrap()
        .into_struct(""),
        "2" => df! {
            "u32" => &[0u32, 0, 1, 1, 0, 0, 1, 1],
            "str" => &["a", "a", "b", "b", "a", "a", "b", "b"],
        }
        .unwrap()
        .into_struct(""),
        "3" => df! {
            "u32" => &[0u32, 1, 0, 1, 0, 1, 0, 1],
            "str" => &["a", "b", "a", "b", "a", "b", "a", "b"],
        }
        .unwrap()
        .into_struct(""),
    }
    .unwrap()
    .lazy();
    println!(
        "manual cartesian product data_frame: {}",
        lazy_frame.clone().collect().unwrap()
    );
    lazy_frame = lazy_frame.select([concat_list(["1", "2", "3"]).unwrap().alias("LIST")]);
    println!(
        "manual cartesian product concat_list data_frame: {}",
        lazy_frame.clone().collect().unwrap()
    );
}

// `cross_join` cartesian product (ERROR)
{
    let mut lazy_frame = df! {
        "u32" => &[0u32, 1],
        "str" => &["a", "b"],
    }
    .unwrap()
    .lazy();
    lazy_frame = lazy_frame
        .clone()
        .select([as_struct(vec![col("u32"), col("str")]).alias("1")])
        .cross_join(
            lazy_frame
                .clone()
                .select([as_struct(vec![col("u32"), col("str")]).alias("2")]),
            None,
        )
        .cross_join(
            lazy_frame.select([as_struct(vec![col("u32"), col("str")]).alias("3")]),
            None,
        );
    println!(
        "cross_join cartesian product data_frame: {}",
        lazy_frame.clone().collect().unwrap()
    );
    // AFTER THIS LINE ERROR
    lazy_frame = lazy_frame.select([concat_list(["1", "2", "3"]).unwrap().alias("LIST")]);
    println!(
        "cross_join cartesian product concat_list data_frame: {}",
        lazy_frame.clone().collect().unwrap()
    );
}

Log output

manual cartesian product data_frame: shape: (8, 3)
┌───────────┬───────────┬───────────┐
│ 1         ┆ 2         ┆ 3         │
│ ---       ┆ ---       ┆ ---       │
│ struct[2] ┆ struct[2] ┆ struct[2] │
╞═══════════╪═══════════╪═══════════╡
│ {0,"a"}   ┆ {0,"a"}   ┆ {0,"a"}   │
│ {0,"a"}   ┆ {0,"a"}   ┆ {1,"b"}   │
│ {0,"a"}   ┆ {1,"b"}   ┆ {0,"a"}   │
│ {0,"a"}   ┆ {1,"b"}   ┆ {1,"b"}   │
│ {1,"b"}   ┆ {0,"a"}   ┆ {0,"a"}   │
│ {1,"b"}   ┆ {0,"a"}   ┆ {1,"b"}   │
│ {1,"b"}   ┆ {1,"b"}   ┆ {0,"a"}   │
│ {1,"b"}   ┆ {1,"b"}   ┆ {1,"b"}   │
└───────────┴───────────┴───────────┘
manual cartesian product concat_list data_frame: shape: (8, 1)
┌─────────────────────────────┐
│ LIST                        │
│ ---                         │
│ list[struct[2]]             │
╞═════════════════════════════╡
│ [{0,"a"}, {0,"a"}, {0,"a"}] │
│ [{0,"a"}, {0,"a"}, {1,"b"}] │
│ [{0,"a"}, {1,"b"}, {0,"a"}] │
│ [{0,"a"}, {1,"b"}, {1,"b"}] │
│ [{1,"b"}, {0,"a"}, {0,"a"}] │
│ [{1,"b"}, {0,"a"}, {1,"b"}] │
│ [{1,"b"}, {1,"b"}, {0,"a"}] │
│ [{1,"b"}, {1,"b"}, {1,"b"}] │
└─────────────────────────────┘
cross_join cartesian product data_frame: shape: (8, 3)
┌───────────┬───────────┬───────────┐
│ 1         ┆ 2         ┆ 3         │
│ ---       ┆ ---       ┆ ---       │
│ struct[2] ┆ struct[2] ┆ struct[2] │
╞═══════════╪═══════════╪═══════════╡
│ {0,"a"}   ┆ {0,"a"}   ┆ {0,"a"}   │
│ {0,"a"}   ┆ {0,"a"}   ┆ {1,"b"}   │
│ {0,"a"}   ┆ {1,"b"}   ┆ {0,"a"}   │
│ {0,"a"}   ┆ {1,"b"}   ┆ {1,"b"}   │
│ {1,"b"}   ┆ {0,"a"}   ┆ {0,"a"}   │
│ {1,"b"}   ┆ {0,"a"}   ┆ {1,"b"}   │
│ {1,"b"}   ┆ {1,"b"}   ┆ {0,"a"}   │
│ {1,"b"}   ┆ {1,"b"}   ┆ {1,"b"}   │
└───────────┴───────────┴───────────┘

called `Result::unwrap()` on an `Err` value: ShapeMismatch(ErrString("series length 2 does not match expected length of 8"))

Issue description

concat_list(["1", "2"]) - OK
concat_list(["1", "3"]), concat_list(["2", "3"]) - ERROR

Expected behavior

created with cross_join cartesian product will behave the same as "manual" cartesian product.

Installed versions

polars = { version = "0.42.0", features = [ "abs", "concat_str", "cross_join", "cum_agg", "diagonal_concat", "dtype-array", "dtype-i8", "dtype-struct", "dtype-u8", "is_in", "lazy", "list_any_all", "list_count", "list_eval", "regex", "round_series", "serde", "strings", ] }
@kgv kgv added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Sep 6, 2024
@cmdlineluser
Copy link
Contributor

Python repro:

import polars as pl

df = pl.LazyFrame({
    "u32": [0, 1],
    "str": ["a", "b"],
})

(df.select(pl.struct(pl.all()).alias("1"))
   .join(
       df.select(pl.struct(pl.all()).alias("2")),
       how = "cross"
   )
   .join(
       df.select(pl.struct(pl.all()).alias("3")),
       how = "cross"
   )
   .select(pl.concat_list("1", "2", "3"))
   .collect()
)
# ShapeError: series length 2 does not match expected length of 8

@coastalwhite coastalwhite added the A-panic Area: code that results in panic exceptions label Sep 6, 2024
@kgv
Copy link
Contributor Author

kgv commented Sep 20, 2024

Add DataFrame::as_single_chunk_par after last cross_join, before concat_list fix this for rust lang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

3 participants