Skip to content

Optimizer performance of APPROX_TOP_K function #51834

Open
@murphyatwork

Description

Enhancement

It can be used to calculate top-k from a large dataset quickly, which is expected to be much faster than plain TOP-K.

But actually it's not, and even slower than the GROUP-BY. Calculate top-k from ssb.lineorder:

  • GROUP-BY: 1.413s
  • APPROX_TOP_K: 54.462s
MySQL root@127.1:ssb> select lo_orderkey, count(*) from lineorder group by lo_orderkey order by count(*) limit 5;
+-------------+----------+
| lo_orderkey | count(*) |
+-------------+----------+
| 30182       | 4        |
| 25441       | 4        |
| 13888       | 4        |
| 27172       | 4        |
| 1990        | 4        |
+-------------+----------+
5 rows in set
Time: 1.413s
MySQL root@127.1:ssb> select APPROX_TOP_K(lo_orderkey) from lineorder;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| approx_top_k(lo_orderkey)                                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"item":5998592,"count":32568},{"item":5988930,"count":32568},{"item":5977440,"count":32568},{"item":5995557,"count":32564},{"item":5991074,"count":32564}] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Time: 54.462s

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions