Skip to content

Calling with_column twice generates an error when the second call uses a window function #12425

Closed
@Michael-J-Ward

Description

@Michael-J-Ward

Describe the bug

Calling with_column twice generates an error when the second column is a window expression.

df
.with_column("foo", <normal_expr>)
.with_column("bar, <window_expr>)

Because "foo" does not have a qualifier, the second call to with_column ends up aliasing it as well.

let mut fields: Vec<Expr> = plan
.schema()
.iter()
.map(|(qualifier, field)| {
if field.name() == name {
col_exists = true;
new_column.clone()
} else if window_func && qualifier.is_none() {
col(Column::from((qualifier, field))).alias(name)
} else {
col(Column::from((qualifier, field)))
}
})
.collect();

Error: Plan("Projections require unique expression names but the expression \"s AS r\" at position 3 and \"row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS r\" at position 4 have the same name. Consider aliasing (\"AS\") one of them.")

To Reproduce

Update test_window_function_with_column to first call with_column with any expression.

For example:

    // Test issue: https://github.com/apache/datafusion/issues/11982
    // Window function was creating unwanted projection when using with_column() method.
    #[tokio::test]
    async fn test_window_function_with_column() -> Result<()> {
        let df = test_table().await?.select_columns(&["c1", "c2", "c3"])?;
        let ctx = SessionContext::new();
        let df_impl = DataFrame::new(ctx.state(), df.plan.clone());
        let func = row_number().alias("row_num");

        // This first `with_column` results in a column without a `qualifier` 
        let df_impl = df_impl.with_column("s", col("c2") + col("c3"))?;

        // This second `with_column` then assigns `"r"` alias to the above column and the window function
        // Should create an additional column with alias 'r' that has window func results
        let df = df_impl.with_column("r", func)?.limit(0, Some(2))?;
        assert_eq!(4, df.schema().fields().len());

        let df_results = df.clone().collect().await?;
        assert_batches_sorted_eq!(
            [
                "+----+----+-----+---+",
                "| c1 | c2 | c3  | r |",
                "+----+----+-----+---+",
                "| c  | 2  | 1   | 1 |",
                "| d  | 5  | -40 | 2 |",
                "+----+----+-----+---+",
            ],
            &df_results
        );

        Ok(())
    }

Expected behavior

I would expect the second call to succeed and the final dataframe to have columns c1, c2, c3, s, r

Additional context

#12000 introduced that conditional.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions