Skip to content

Aggregation fuzz testing #12114

Closed
Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

While reviewing #11943 from @Rachelint it is becoming clear to me that the hash aggregate code is now pretty sophisticated and I am not sure our testing has kept up. In fact I couldn't come up with a great way to systematically test the new code added in #11943

Also, the code in #11627 from @korowa for skipping partial aggregates has a similar problem as it is not invoked There is also code for streaming and partial streaming group by.

All this code has unit tests, but I am not confident that all the combinations are checked. For example the code paths are affected by:

  1. Sort order of the input
  2. partitioning of the input
  3. The type of the group keys
  4. The number of groups
  5. The number of rows in each group
  6. The type of the aggregate
  7. The number of aggregates
  8. If the aggregate supports group aggregation
  9. If the groups aggregator supports partial aggregation skipping

Describe the solution you'd like

I would like a more systematic way to test this code to ensure out current code is correct but also to ensure that future changes do not introduce subtle hard to debug regressions / wrong results

Describe alternatives you've considered

What I think would be good is a test framework that:

  1. Describes an input data set (e.g. RecordBatches)
  2. Run the same query on the same input data set with different configurations (e.g. block size, input sort order, distribution of input blocks, etc)
  3. Compare the results and ensure it is the same in all cases

Parameters to randomly vary for each input:

  1. Sort order if the input
  2. target block size
  3. Number of input partitions
  4. memory limit (to force spilling)
  5. Shuffled input row distribution across blocks
  6. the skipping partial aggregation enabling or not

Test cases:
2. Types of the group keys
2. single/multiple column groups
3. Number of groups (low/high cardinality)
4. Different aggregates

Additional context

We also have some great sql fuzz coverage in https://github.com/datafusion-contrib/datafusion-sqlancer from @2010YOUY01, but I think that focuses on the queries themselves, rather than the setup (block size, input order, etc)

Existing aggregate coverage in datafusion core fuzz test (cargo test --test fuzz

Subtasks

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions