Skip to content

MATCH_RECOGNIZE is slow with multiple repeat column values #18036

Open
@hackeryang

Description

Hi dear community, we reproduced a small case about this kind of low perfomance, which is shown below:

CREATE TABLE blackhole.default.nation ( nationkey bigint) WITH ( split_count = 1,
                                                                 pages_per_split = 2,
                                                                 rows_per_page = 2500);


select * from nation match_recognize(partition by nationkey order by nationkey
                                    pattern (A+ B+) define B as (nationkey > 0));

 nationkey 
-----------
(0 rows)

Query 20230625_025504_03792_wgwpg, FINISHED, 3 nodes
http://ta3:8080/ui/query.html?20230625_025504_03792_wgwpg
Splits: 21 total, 21 done (100.00%)
CPU Time: 45.4s total,   110 rows/s,   990B/s, 92% active
Per Node: 0.3 parallelism,    33 rows/s,   304B/s
Parallelism: 0.9
Peak Memory: 530KB
0:49 [5K rows, 43.9KB] [101 rows/s, 914B/s]

The execution time of the query above is long when the column nationkey has many same values i.e. 0.

Our test cluster has 3 nodes with below specifications:

Hardware Specification
CPU Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz(4 core with Hyper-threading)
Memory 62GB
Disk 1TB SATA, 143.15 MB/sec

And we also made a flame chart:
image

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions