Closed
Description
Is your feature request related to a problem or challenge?
Currently, the benchmarks folder in DataFusion does not include dedicated benchmarks for TopK queries (i.e., queries formatted as SELECT ... ORDER BY a LIMIT n
).
With ongoing work to optimize these types of queries, having dedicated benchmarks would be valuable for measuring progress.
Describe the solution you'd like
There are already sorting benchmarks based on the TPCH dataset. Since a TopK query is essentially a sort operation with an additional limit, we can extend the existing sort_tpch
benchmarks by introducing an optional LIMIT n
clause. This modification would effectively convert them into proper TopK benchmarks.
Describe alternatives you've considered
No response
Additional context
Relevant recent issues: