Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the struct function return the correct data type. #6594

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

jiangzhx
Copy link
Contributor

@jiangzhx jiangzhx commented Jun 8, 2023

Which issue does this PR close?

use datafusion::prelude::*;
use datafusion_expr::BuiltinScalarFunction;
​
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    let ctx = SessionContext::new();
    ctx.register_csv("example", "tests/data/example.csv", CsvReadOptions::new()).await?;
​
    let df = ctx.table("example").await?;
​
    let f = Expr::ScalarFunction(datafusion_expr::expr::ScalarFunction {
        fun: BuiltinScalarFunction::Struct,
        args: vec![col("a"), col("b")],
    });
​
    let df2 = df.select(vec![f])?;
    df2.show().await?;
    Ok(())
}

Error: ArrowError(InvalidArgumentError("column types must match schema types, expected Struct([]) but found Struct([Field { name: "c0", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "c1", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) at column index 0"))

Closes #6597.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@jiangzhx jiangzhx marked this pull request as draft June 8, 2023 04:52
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 8, 2023
@jiangzhx jiangzhx marked this pull request as ready for review June 8, 2023 07:50
@jiangzhx jiangzhx marked this pull request as draft June 8, 2023 09:34
@jiangzhx jiangzhx force-pushed the issues/Struct_returntype branch 3 times, most recently from 7222c9f to e601bc6 Compare June 8, 2023 12:43
@jiangzhx jiangzhx marked this pull request as ready for review June 8, 2023 12:59
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @jiangzhx . I suggest one more test to show that the column names are always named "c1, c2, etc" even when they don't align with the table column names


# Scalar function struct
statement ok
create table simple_struct_test (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add an example test where the column names don't happen to line up with the column names?

Like

create table test(x boolean) as values (true);
select struct(x) from test

?

I expect the output is named c1 even though the column is named x

To make the output struct named x I think would be a significant change to the type resolution code as it would need to get a Schema rather than a [DataType]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your help.
i have updated the current test case:

create table simple_struct_test (
    c1 boolean,
    c2 INT,
    c3 FLOAT,
    c4 DOUBLE,
    a VARCHAR,
    b TEXT
) as select *
from (values
  (true, 1,3.1,3.14,'str','text')
);

it's should cover the scenario you mentioned.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jiangzhx

@alamb alamb merged commit 34fa0b9 into apache:main Jun 9, 2023
@jiangzhx jiangzhx deleted the issues/Struct_returntype branch June 12, 2023 02:24
jayzhan211 pushed a commit to jayzhan211/arrow-datafusion that referenced this pull request Jun 12, 2023
* Make the struct function return the correct data type.

* add testcase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return type of the struct function is always a struct with an empty list of fields
2 participants