Closed
Description
Is your feature request related to a problem or challenge?
Consider the query below
SELECT s.sn, s.amount
FROM sales_global AS s
GROUP BY sn
When sn
is PRIMARY KEY
(each sn
will have unique values) we know that, all the columns the table sales_global
e.g s
can be emitted after aggregation (since for each group all rows would have same value). Corresponding query can run in Postgre. However, datafusion can only emit s.sn
after aggregation from the original table.
Describe the solution you'd like
I would like to have this support.
Describe alternatives you've considered
No response
Additional context
To reproduce the problem one can use the test below
#[tokio::test]
async fn test_primary_key_aggregation() -> Result<()> {
let config = SessionConfig::new()
.with_target_partitions(1);
let ctx = SessionContext::with_config(config);
ctx.sql("CREATE TABLE sales_global (
sn INT PRIMARY KEY,
ts TIMESTAMP,
currency VARCHAR(3),
amount INT
) as VALUES
(1, '2022-01-01 08:00:00'::timestamp, 'EUR', 50.00),
(2, '2022-01-01 11:30:00'::timestamp, 'EUR', 75.00),
(3, '2022-01-02 12:00:00'::timestamp, 'EUR', 200.00),
(4, '2022-01-03 10:00:00'::timestamp, 'EUR', 100.00)").await?;
let sql = "SELECT s.sn, s.amount
FROM sales_global AS s
GROUP BY sn";
let msg = format!("Creating logical plan for '{sql}'");
let dataframe: DataFrame = ctx.sql(sql).await.expect(&msg);
let physical_plan = dataframe.create_physical_plan().await?;
let batches = collect(physical_plan, ctx.task_ctx()).await?;
print_batches(&batches)?;
Ok(())
}