Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support unnest as subexpression #9592

Merged
merged 5 commits into from
Mar 14, 2024

Conversation

YjyJeff
Copy link
Contributor

@YjyJeff YjyJeff commented Mar 13, 2024

Which issue does this PR close?

Closees 9591 and 9589

Rationale for this change

Users could use unnest function with other expressions instead of writing a new wrapper query

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Mar 13, 2024

## Unnest multiple columns
query II
select unnest(column1), unnest(column2) from unnest_table;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of this query should be incorrect. In DuckDB:

D CREATE TABLE unnest_table
AS VALUES
    ([1,2,3], [7], 1),
    ([4,5], [8,9,10], 2),
    ([6], [11,12], 3),
    ([12], [null, 42, null], null),
    -- null array to verify the `preserve_nulls` option
    (null, null, 4)
;
D select unnest(col0), unnest(col1) from unnest_table;
┌──────────────┬──────────────┐
│ unnest(col0) │ unnest(col1) │
│    int32     │    int32     │
├──────────────┼──────────────┤
│            1 │            7 │
│            2 │              │
│            3 │              │
│            4 │            8 │
│            5 │            9 │
│              │           10 │
│            6 │           11 │
│              │           12 │
│           12 │              │
│              │           42 │
│              │              │
├──────────────┴──────────────┤
│ 11 rows           2 columns │
└─────────────────────────────┘

For multiple unnest expressions, we might need to extend the functionality of UnnestExec to support it. I think we could implement it later, maybe in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonahgao Agreed, I will remove this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonahgao Removed

@YjyJeff YjyJeff changed the title support unnest as subexpression and multiple unnests in projection support unnest as subexpression Mar 13, 2024
@@ -135,11 +135,6 @@ select array_remove(column1, 4), unnest(column2), column3 * 10 from unnest_table
query error DataFusion error: Error during planning: unnest\(\) can only be applied to array, struct and null
select unnest(column3) from unnest_table;

## Unnest multiple columns
query error DataFusion error: This feature is not implemented: Only support single unnest expression for now
select unnest(column1), unnest(column2) from unnest_table;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems multiple columns are not supported yet

Copy link
Contributor

@jayzhan211 jayzhan211 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean multiple columns is actually multiple unnest expression, it seems misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 I renamed it to "multiple unnest functions"

data: transformed_expr,
transformed,
tnr: _,
} = expr.transform_up_mut(&mut |expr: Expr| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the logic of unnest is split out of others two?
is doing if else directly without transformed more readable?

if unnest
else if column
else {
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 The transform is essential for this feature. We need to rewrite the expression in bottom-up recursion such that we can extract the child and parent expression of the unnest function. The reason to split out the Column variant is that: we need to retain the relation field

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got it. First time seeing transform_up_mut, not familiar with it.

Ok(arg_data_types[0].clone())
let arg_data_type = arg_data_types[0].clone();
// Unnest's output type is the inner type of the list
match arg_data_type{
Copy link
Contributor

@jayzhan211 jayzhan211 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @YjyJeff

data: transformed_expr,
transformed,
tnr: _,
} = expr.transform_up_mut(&mut |expr: Expr| {
let column_name = expr.display_name()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved inside the if statement, because it is not used in the else statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

My only slight concern is that there are two projections around the unnest plan, and there are always an extra copy of projection expressions. However, this seems hard to avoid unless UnnestExec can unnest an expression, rather than just a column. This could perhaps be a future work.

Thank you for your nice contribution @YjyJeff

@jayzhan211 jayzhan211 merged commit 8b4a8e6 into apache:main Mar 14, 2024
23 checks passed
@jayzhan211
Copy link
Contributor

Thanks @YjyJeff and @jonahgao .

@jonahgao jonahgao mentioned this pull request Apr 9, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants