Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: NDV estimation formula is too naive #54812

Open
Tracked by #54816
qw4990 opened this issue Jul 22, 2024 · 1 comment
Open
Tracked by #54816

planner: NDV estimation formula is too naive #54812

qw4990 opened this issue Jul 22, 2024 · 1 comment
Labels
affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 affects-8.1 epic/cardinality-estimation the optimizer cardinality estimation report/customer Customers have encountered this bug. sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Jul 22, 2024

Enhancement

Now our NDV estimation formula is too naive, for example, for the query select distinct(a) from t where b=1, we need to estimate NDV(a), and now our formula is NDV(a) = NDV_a_from_stats * Selectivity(b=1), which might cause a large estimation error.

Below is a concrete case:

create table t (a int, b int, key(a), key(b));
insert into t values (0, 1);
insert into t values (1, 1);
...
insert into t values (99, 1); -- 100 rows with b=1, (i, b)
insert into t values (100, 2);
insert into t values (100, 2);
...
insert into t values (100, 2); -- 1000 rows with b=2, (100, 2)

Then for the query above, to estimate NDV(a), we use NDV_a_from_stats * Selectivity(b=1) = 101 * 100/1100 = 9.18, but the actual result is 100:

mysql> explain analyze select distinct(a) from t where b=1;
+------------------------------+---------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+------------+---------+
| id                           | estRows | actRows | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                            | operator info                                                                 | memory     | disk    |
+------------------------------+---------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+------------+---------+
| HashAgg_12                   | 9.18    | 100     | root      |               | time:1.2ms, loops:6, RU:0.487909, partial_worker:{wall_time:1.182167ms, concurrency:5, task_num:1, tot_wait:5.661418ms, tot_exec:83.708µs, tot_time:5.753293ms, max:1.162125ms, p95:1.162125ms}, final_worker:{wall_time:1.188667ms, concurrency:5, task_num:5, tot_wait:20.625µs, tot_exec:250ns, tot_time:5.854959ms, max:1.175125ms, p95:1.175125ms}   | group by:htktidb_test.t.a, funcs:firstrow(htktidb_test.t.a)->htktidb_test.t.a | 25.0 KB    | 0 Bytes |
| └─TableReader_13             | 9.18    | 100     | root      |               | time:1.06ms, loops:2, cop_task: {num: 1, max: 990.1µs, proc_keys: 0, copr_cache_hit_ratio: 0.00, build_task_duration: 20.8µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:955.3µs}}                                                                                                                                                 | data:HashAgg_5                                                                | 1009 Bytes | N/A     |
|   └─HashAgg_5                | 9.18    | 100     | cop[tikv] |               | tikv_task:{time:905.8µs, loops:0}                                                                                                                                                                                                                                                                                                                         | group by:htktidb_test.t.a,                                                    | N/A        | N/A     |
|     └─Selection_11           | 100.00  | 100     | cop[tikv] |               | tikv_task:{time:905.8µs, loops:0}                                                                                                                                                                                                                                                                                                                         | eq(htktidb_test.t.b, 1)                                                       | N/A        | N/A     |
|       └─TableFullScan_10     | 1100.00 | 100     | cop[tikv] | table:t       | tikv_task:{time:905.8µs, loops:0}                                                                                                                                                                                                                                                                                                                         | keep order:false                                                              | N/A        | N/A     |
+------------------------------+---------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+------------+---------+

Another case is for Join Estimation: select * from t1, t2 where t1.a=t2.a and t2.b=1, for this case we might get a wrong estimation result of NDV(t2.a) and then finally get a wrong Join Plan.

@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner epic/cardinality-estimation the optimizer cardinality estimation labels Jul 22, 2024
@seiya-annie
Copy link

/report customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 affects-8.1 epic/cardinality-estimation the optimizer cardinality estimation report/customer Customers have encountered this bug. sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants