Skip to content

FilterPushDown optimization through UNION ALL results in SchemaError #3281

@jonmmease

Description

@jonmmease

Describe the bug
There seems to be an interaction between the FilterPushDown logical optimization and the UNION ALL operator that results in a SchemaError.

Here is the simplest example I was able to identify:

To Reproduce

use std::sync::Arc;
use datafusion::arrow::datatypes::Schema;
use datafusion::datasource::empty::EmptyTable;
use datafusion::prelude::SessionContext;
use vegafusion_core::arrow::datatypes::{DataType, Field};

#[tokio::test]
async fn test_query() {
    let ctx = SessionContext::new();
    let schema = Schema::new(vec![
        Field::new("colA", DataType::Float64, true),
    ]);
    let table = EmptyTable::new(Arc::new(schema));
    ctx.register_table("tbl", Arc::new(table)).unwrap();

    let df = ctx.sql(r#"
WITH
    tbl_1 AS (SELECT "colA" AS "renamedA" FROM "tbl"),
    tbl_2 AS ((SELECT * FROM tbl_1) UNION ALL (SELECT * FROM tbl_1))
SELECT * FROM tbl_2 WHERE ("renamedA" IS NOT NULL)
    "#).await.unwrap();

    let result = df.collect().await.unwrap();
}
called `Result::unwrap()` on an `Err` value: SchemaError(FieldNotFound { qualifier: None, name: "renamedA", valid_fields: Some(["tbl.colA"]) })

Expected behavior
This code block should evaluate successfully to an empty record batch containing the renamedA column.

Additional context
The example executes successfully if UNION ALL is replaced by UNION. And it executes successfully if I comment out the use of the FilterPushDown in datafusion/core/src/execution/context.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions