-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11.4
Are you sure you want to change the base?
Conversation
…lumns (Preparation for the main patch) set_statistics_for_table() incorrectly treated indexes with all NULL values the same as indexes with no statistics, because avg_frequency is 0 in both cases. This caused the optimizer to ignore valid EITS data and fall back to engine statistics. Additionally, KEY::actual_rec_per_key() would fall back to engine statistics even when EITS was available, and used incorrect pointer comparison (rec_per_key == 0 instead of nullptr). Fix by adding Index_statistics::stats_were_read flag to track per-index whether statistics were actually read from persistent tables, and restructuring actual_rec_per_key() to prioritize EITS when available.
…olumns When all values in an indexed column are NULL, EITS statistics show avg_frequency == 0. This commit adds logic to distinguish between "no statistics available" and "all values are NULL" scenarios. For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics confirm all indexed values are NULL, the optimizer can now return a very low cardinality estimate (1.0) instead of unknown (0.0), since NULL = NULL never matches. For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col), normal cardinality estimation continues to apply since matches are possible. Changes: - Added KEY::rec_per_key_null_aware() to check nulls_ratio from column statistics when avg_frequency is 0 - Modified best_access_path() in sql_select.cc to use the new rec_per_key_null_aware() method for ref access cost estimation - The optimization works with single-column and composite indexes, checking each key part's NULL-rejecting status via notnull_part bitmap
bb5d8c9
to
3affe5f
Compare
f6dc6ed
to
42b4826
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve but let @spetrunia look before merging.
|
1.0 is not "low cardinality". It assumes high cardinality, that all values are different. |
I think t he patch doesn't work for partially-covered columns. A testcase: create table t1 (a varchar(10));
insert into t1 select seq from seq_1_to_10;
create table t2 (
a varchar(10),
b varchar(10),
index i1(a,b(5))
);
insert into t2 select seq, NULL from seq_1_to_1000;
analyze table t2 persistent for columns (b) indexes (i1);
explain select * from t1, t2 where t2.a=t1.a and t2.b=t1.a; This is because |
I agree, actually it is the |
Description
When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.
Release Notes
TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.
How can this PR be tested?
./mtr mdev-36761
Basing the PR against the correct MariaDB version
main
branch.PR quality check