MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331

Olernov · 2025-10-01T18:27:43Z

The Jira issue number for this PR is: MDEV-36761

Description

When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.

For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics
confirm all indexed values are NULL, the optimizer can now return a
very low cardinality estimate (1.0) instead of unknown (0.0), since
NULL = NULL never matches.

For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col),
normal cardinality estimation continues to apply since matches are possible.

Changes:
- Added KEY::rec_per_key_null_aware() to check nulls_ratio from column
  statistics when avg_frequency is 0
- Modified best_access_path() in sql_select.cc to use the new
  rec_per_key_null_aware() method for ref access cost estimation
- The optimization works with single-column and composite indexes,
  checking each key part's NULL-rejecting status via notnull_part bitmap

Release Notes

TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.

How can this PR be tested?

./mtr mdev-36761

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

…lumns (Preparation for the main patch) set_statistics_for_table() incorrectly treated indexes with all NULL values the same as indexes with no statistics, because avg_frequency is 0 in both cases. This caused the optimizer to ignore valid EITS data and fall back to engine statistics. Additionally, KEY::actual_rec_per_key() would fall back to engine statistics even when EITS was available, and used incorrect pointer comparison (rec_per_key == 0 instead of nullptr). Fix by adding Index_statistics::stats_were_read flag to track per-index whether statistics were actually read from persistent tables, and restructuring actual_rec_per_key() to prioritize EITS when available.

…olumns When all values in an indexed column are NULL, EITS statistics show avg_frequency == 0. This commit adds logic to distinguish between "no statistics available" and "all values are NULL" scenarios. For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics confirm all indexed values are NULL, the optimizer can now return a very low cardinality estimate (1.0) instead of unknown (0.0), since NULL = NULL never matches. For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col), normal cardinality estimation continues to apply since matches are possible. Changes: - Added KEY::rec_per_key_null_aware() to check nulls_ratio from column statistics when avg_frequency is 0 - Modified best_access_path() in sql_select.cc to use the new rec_per_key_null_aware() method for ref access cost estimation - The optimization works with single-column and composite indexes, checking each key part's NULL-rejecting status via notnull_part bitmap

sql/sql_statistics.h

sql/table.cc

DaveGosselin-MariaDB

I approve but let @spetrunia look before merging.

spetrunia · 2025-10-07T15:53:48Z

double KEY::actual_rec_per_key(uint max_key_part) const
I think the name max_key_part here is misleading because the parameter can be any key part that one is interested in. It's not "max".

spetrunia · 2025-10-07T16:46:08Z

condition (indicated by bit set in notnull_part) and the statistics

confirm all values are NULL (nulls_ratio == 1.0), we can return a very

low cardinality estimate (1.0) instead of 0.0 (unknown), indicating

1.0 is not "low cardinality". It assumes high cardinality, that all values are different.

spetrunia · 2025-10-07T16:49:41Z

I think t he patch doesn't work for partially-covered columns. A testcase:

create table t1 (a varchar(10));
insert into t1 select seq from seq_1_to_10;

create table t2 (
  a varchar(10), 
  b varchar(10),
  index i1(a,b(5))
);
insert into t2 select seq, NULL from seq_1_to_1000;
analyze table t2 persistent for columns (b) indexes (i1);

explain select * from t1, t2 where t2.a=t1.a and t2.b=t1.a;

This is because key_part[bit].field is a special field object representing the "Field as key part" and it has no stats set.
Need to use table->field[ key_part[bit].field->field_index ( also -1 here? or was that fixed?) ]

mariadb-OlegSmirnov · 2025-10-08T09:04:27Z

double KEY::actual_rec_per_key(uint max_key_part) const
I think the name max_key_part here is misleading because the parameter can be any key part that one is interested in. It's not "max".

I agree, actually it is the index of the last key part in the prefix (0-based). What do you think can be a better name? last_key_part_in_prefix maybe?

Olernov added the MariaDB Corporation label Oct 1, 2025

DaveGosselin-MariaDB self-requested a review October 2, 2025 14:05

DaveGosselin-MariaDB assigned Olernov Oct 2, 2025

Olernov force-pushed the 11.4-MDEV-36761-all-nulls-v2 branch from bb5d8c9 to 3affe5f Compare October 2, 2025 16:15

DaveGosselin-MariaDB reviewed Oct 2, 2025

View reviewed changes

sql/sql_statistics.h Show resolved Hide resolved

sql/table.cc Show resolved Hide resolved

sql/table.cc Outdated Show resolved Hide resolved

sql/table.cc Outdated Show resolved Hide resolved

sql/table.cc Outdated Show resolved Hide resolved

sql/table.cc Outdated Show resolved Hide resolved

MDEV-36761 Fix code review comments

42b4826

Olernov force-pushed the 11.4-MDEV-36761-all-nulls-v2 branch from f6dc6ed to 42b4826 Compare October 6, 2025 14:27

DaveGosselin-MariaDB self-requested a review October 7, 2025 11:38

DaveGosselin-MariaDB approved these changes Oct 7, 2025

View reviewed changes

MDEV-36761 Fix code review comments 2

d6d9ed2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331

MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331

Olernov commented Oct 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveGosselin-MariaDB left a comment

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

mariadb-OlegSmirnov commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331

Are you sure you want to change the base?

MDEV-36761: Implement NULL-aware cardinality estimation for indexed columns #4331

Conversation

Olernov commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveGosselin-MariaDB left a comment

Choose a reason for hiding this comment

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

spetrunia commented Oct 7, 2025

Uh oh!

mariadb-OlegSmirnov commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Olernov commented Oct 1, 2025 •

edited

Loading

mariadb-OlegSmirnov commented Oct 8, 2025 •

edited

Loading