Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNION ALL sets the wrong column alias if union arm has aggregate expression #6139

Closed
comphead opened this issue Apr 27, 2023 · 7 comments · Fixed by #6373
Closed

UNION ALL sets the wrong column alias if union arm has aggregate expression #6139

comphead opened this issue Apr 27, 2023 · 7 comments · Fixed by #6373
Labels
bug Something isn't working

Comments

@comphead
Copy link
Contributor

comphead commented Apr 27, 2023

Describe the bug

UNION ALL sets the wrong column alias if first union arm has aggregate expression, see query below.

To Reproduce

#[tokio::test]
async fn union_with_aggr_first() -> Result<()> {
    let ctx = create_union_context()?;
    let sql = "select count(1) x from (select 1 a) group by a union all select 1 b;";
    let df = ctx.sql(sql).await.unwrap().repartition(Partitioning::RoundRobinBatch(1)).unwrap();
    let actual = df.collect().await.unwrap();

    let expected = vec![
        "+---+",
        "| x |",
        "+---+",
        "| 1 |",
        "| 1 |",
        "+---+",
    ];
    assert_batches_eq!(expected, &actual);
    Ok(())
}

Without group by the query works fine
select count(1) x from (select 1 a) a union all select 1 b;

Expected behavior

The test should pass

Additional context

Related to #5695 #5747

@comphead comphead added the bug Something isn't working label Apr 27, 2023
@comphead
Copy link
Contributor Author

Working on it

@comphead
Copy link
Contributor Author

comphead commented May 5, 2023

@alamb @jackwener may I ask your help? I have found why projection is not correct. push_down_projection for union never called for query like that

select 1 a group by a union all select 2 b

However its called for

select 1 a group by a union select 2 b

The reason for that UNION has another logical plan
UNION

| logical_plan  | Aggregate: groupBy=[[a]], aggr=[[]]                                                                                  |
|               |   Union                                                                                                              |
|               |     Projection: Int64(1) AS a                                                                                        |
|               |       Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                                                     |
|               |         EmptyRelation                                                                                                |
|               |     Projection: Int64(2) AS a                                                                                        |
|               |       EmptyRelation   

UNION ALL

| logical_plan  | Union                                                                                                        |
|               |   Projection: Int64(1) AS a                                                                                  |
|               |     Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                                               |
|               |       EmptyRelation                                                                                          |
|               |   Projection: Int64(2) AS b                                                                                  |
|               |     EmptyRelation      

UNION have a weird top Aggregate node, which looks strange for me. as Aggregation happens twice in that case, but because of that UNION push_down_projection works.

I'm trying to understand the real rootcause if it in bad plan generated in the first place or bug in push_down_projection

UPD: Excessive aggregation for UNION is likely distinct values

@comphead
Copy link
Contributor Author

comphead commented May 6, 2023

the effect of that is another bug with simple union all table creation(without aggr)

❯ create table t as select 1 a union all select 1 b;
Error during planning: Mismatch between schema and batches

@jackwener
Copy link
Member

Union is Union All + Dedup, so in most DB, Union is combination of Agg + Union

@comphead
Copy link
Contributor Author

comphead commented May 6, 2023

Union is Union All + Dedup, so in most DB, Union is combination of Agg + Union

Thanks, I was expecting DISTINCT in plan, not a aggregation. But DISTINCT may be can considered as partial case of aggregation

@alamb
Copy link
Contributor

alamb commented May 8, 2023

Thanks, I was expecting DISTINCT in plan, not a aggregation. But DISTINCT may be can considered as partial case of aggregation

I agree that Distinct would be clearer in the LogicalPlan (though it gets rewritten to Aggregate pretty early on as I recall). Maybe it would be clearer to remove LogicalPlan::Distinct entirely and only use LogicalPlan::Aggregate 🤔

@jackwener
Copy link
Member

I agree that Distinct would be clearer in the LogicalPlan (though it gets rewritten to Aggregate pretty early on as I recall). Maybe it would be clearer to remove LogicalPlan::Distinct entirely and only use LogicalPlan::Aggregate 🤔

Yes, it's replaced in rule SingleDistinctToGroupBy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants