Description
Is your feature request related to a problem or challenge?
While reviewing #11943 from @Rachelint it is becoming clear to me that the hash aggregate code is now pretty sophisticated and I am not sure our testing has kept up. In fact I couldn't come up with a great way to systematically test the new code added in #11943
Also, the code in #11627 from @korowa for skipping partial aggregates has a similar problem as it is not invoked There is also code for streaming and partial streaming group by.
All this code has unit tests, but I am not confident that all the combinations are checked. For example the code paths are affected by:
- Sort order of the input
- partitioning of the input
- The type of the group keys
- The number of groups
- The number of rows in each group
- The type of the aggregate
- The number of aggregates
- If the aggregate supports group aggregation
- If the groups aggregator supports partial aggregation skipping
Describe the solution you'd like
I would like a more systematic way to test this code to ensure out current code is correct but also to ensure that future changes do not introduce subtle hard to debug regressions / wrong results
Describe alternatives you've considered
What I think would be good is a test framework that:
- Describes an input data set (e.g. RecordBatches)
- Run the same query on the same input data set with different configurations (e.g. block size, input sort order, distribution of input blocks, etc)
- Compare the results and ensure it is the same in all cases
Parameters to randomly vary for each input:
- Sort order if the input
- target block size
- Number of input partitions
- memory limit (to force spilling)
- Shuffled input row distribution across blocks
- the skipping partial aggregation enabling or not
Test cases:
2. Types of the group keys
2. single/multiple column groups
3. Number of groups (low/high cardinality)
4. Different aggregates
Additional context
We also have some great sql fuzz coverage in https://github.com/datafusion-contrib/datafusion-sqlancer from @2010YOUY01, but I think that focuses on the queries themselves, rather than the setup (block size, input order, etc)
Existing aggregate coverage in datafusion core fuzz
test (cargo test --test fuzz
datafusion/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs
Lines 48 to 49 in e088945
Subtasks