Skip to content

INSERT INTO SQL failing on CSV-backed table #10324

@singularsyntax

Description

@singularsyntax

Describe the bug

Hello,

When I try to insert data with the INSERT INTO SQL syntax (see reproduction code below), I get the error: Inserting query must have the same schema with the table.

[2024-05-01T00:48:23Z INFO] TABLE SCHEMA: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [FunctionalDependence { source_indices: [0], target_indices: [0, 1], nullable: false, mode: Single }] } }
[2024-05-01T00:48:23Z INFO] DATAFRAME SCHEMA: DFSchema { fields: [DFField { qualifier: None, field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [] } }
thread 'main' panicked at src/main.rs:317:88:
called `Result::unwrap()` on an `Err` value: Plan("Inserting query must have the same schema with the table.")

As logged above, the problem seems to be in the discrepancy between the table schema, which is qualified with the table name, and the query schema, which is not.

The code I'm using is about as simple as I can imagine. Am I missing something? Is there some example code that demonstrates how to use INSERT INTO SQL correctly? Or is this a bug?

To Reproduce

async fn df_test() {
    let ctx = SessionContext::new();
    let sql = "CREATE EXTERNAL TABLE test (k VARCHAR PRIMARY KEY NOT NULL, v VARCHAR NOT NULL) STORED AS CSV LOCATION './store/test/'";
    let df = ctx.sql(sql).await.unwrap();

    df.collect().await.unwrap();

    let table_df = ctx.table("test").await.unwrap();
    info!("TABLE SCHEMA: {:?}", table_df.schema());
 
    let sql = "INSERT INTO test (k, v) VALUES ('foo', 'bar')";
    let query_df = ctx.sql(sql).await.unwrap();
    info!("DATAFRAME SCHEMA: {:?}", query_df.schema());

    let _result = query_df.write_table("test", DataFrameWriteOptions::default()).await.unwrap();
}

Expected behavior

Insertion of the row ('foo', 'bar') is successful. DataFusion creates a CSV file in the filesystem corresponding to the inserted data.

Additional context

[dependencies]
datafusion = "37.1.0"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions