Skip to content

Extend benchmarking to "TopK" queries #15559

Closed
@geoffreyclaude

Description

@geoffreyclaude

Is your feature request related to a problem or challenge?

Currently, the benchmarks folder in DataFusion does not include dedicated benchmarks for TopK queries (i.e., queries formatted as SELECT ... ORDER BY a LIMIT n).

With ongoing work to optimize these types of queries, having dedicated benchmarks would be valuable for measuring progress.

Describe the solution you'd like

There are already sorting benchmarks based on the TPCH dataset. Since a TopK query is essentially a sort operation with an additional limit, we can extend the existing sort_tpch benchmarks by introducing an optional LIMIT n clause. This modification would effectively convert them into proper TopK benchmarks.

Describe alternatives you've considered

No response

Additional context

Relevant recent issues:

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions