Skip to content

Distinct aggregates return incorrect results #1260

@andygrove

Description

@andygrove

Describe the bug

When translating aggregate expressions to DataFusion we ignore whether the aggregate is distinct or not, resulting in incorrect behavior.

The existing tests seem to pass because the input data does not contain duplicates.

Steps to reproduce

Modify test "single group-by column + aggregate column, multiple batches, no null" to use COUNT(DISTINCT _2) instead of COUNT(DISTINCT _1) and the test fails because the results do not match Spark.

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions