Skip to content

Utilize PRIMARY KEY information better #6190

Closed
@mustafasrepo

Description

@mustafasrepo

Is your feature request related to a problem or challenge?

Consider the query below

SELECT s.sn, s.amount
            FROM sales_global AS s
            GROUP BY sn

When sn is PRIMARY KEY (each sn will have unique values) we know that, all the columns the table sales_global e.g s can be emitted after aggregation (since for each group all rows would have same value). Corresponding query can run in Postgre. However, datafusion can only emit s.sn after aggregation from the original table.

Describe the solution you'd like

I would like to have this support.

Describe alternatives you've considered

No response

Additional context

To reproduce the problem one can use the test below

#[tokio::test]
async fn test_primary_key_aggregation() -> Result<()> {
    let config = SessionConfig::new()
        .with_target_partitions(1);
    let ctx = SessionContext::with_config(config);
    ctx.sql("CREATE TABLE sales_global (
      sn INT PRIMARY KEY,
      ts TIMESTAMP,
      currency VARCHAR(3),
      amount INT
    ) as VALUES
      (1, '2022-01-01 08:00:00'::timestamp, 'EUR', 50.00),
      (2, '2022-01-01 11:30:00'::timestamp, 'EUR', 75.00),
      (3, '2022-01-02 12:00:00'::timestamp, 'EUR', 200.00),
      (4, '2022-01-03 10:00:00'::timestamp, 'EUR', 100.00)").await?;
    let sql = "SELECT s.sn, s.amount
        FROM sales_global AS s
        GROUP BY sn";

    let msg = format!("Creating logical plan for '{sql}'");
    let dataframe: DataFrame = ctx.sql(sql).await.expect(&msg);
    let physical_plan = dataframe.create_physical_plan().await?;
    let batches = collect(physical_plan, ctx.task_ctx()).await?;
    print_batches(&batches)?;
    Ok(())
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions