MATCH_RECOGNIZE is slow with multiple repeat column values #18036
Open
Description
Hi dear community, we reproduced a small case about this kind of low perfomance, which is shown below:
CREATE TABLE blackhole.default.nation ( nationkey bigint) WITH ( split_count = 1,
pages_per_split = 2,
rows_per_page = 2500);
select * from nation match_recognize(partition by nationkey order by nationkey
pattern (A+ B+) define B as (nationkey > 0));
nationkey
-----------
(0 rows)
Query 20230625_025504_03792_wgwpg, FINISHED, 3 nodes
http://ta3:8080/ui/query.html?20230625_025504_03792_wgwpg
Splits: 21 total, 21 done (100.00%)
CPU Time: 45.4s total, 110 rows/s, 990B/s, 92% active
Per Node: 0.3 parallelism, 33 rows/s, 304B/s
Parallelism: 0.9
Peak Memory: 530KB
0:49 [5K rows, 43.9KB] [101 rows/s, 914B/s]
The execution time of the query above is long when the column nationkey
has many same values i.e. 0.
Our test cluster has 3 nodes with below specifications:
Hardware | Specification |
---|---|
CPU | Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz(4 core with Hyper-threading) |
Memory | 62GB |
Disk | 1TB SATA, 143.15 MB/sec |