Skip to content

Conversation

Olernov
Copy link
Contributor

@Olernov Olernov commented Oct 1, 2025

  • The Jira issue number for this PR is: MDEV-36761

Description

When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.

For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics
confirm all indexed values are NULL, the optimizer can now return a
very low cardinality estimate (1.0) instead of unknown (0.0), since
NULL = NULL never matches.

For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col),
normal cardinality estimation continues to apply since matches are possible.

Changes:
- Added KEY::rec_per_key_null_aware() to check nulls_ratio from column
  statistics when avg_frequency is 0
- Modified best_access_path() in sql_select.cc to use the new
  rec_per_key_null_aware() method for ref access cost estimation
- The optimization works with single-column and composite indexes,
  checking each key part's NULL-rejecting status via notnull_part bitmap

Release Notes

TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.

How can this PR be tested?

./mtr mdev-36761

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

…lumns

(Preparation for the main patch)

set_statistics_for_table() incorrectly treated indexes with all NULL
values the same as indexes with no statistics, because avg_frequency
is 0 in both cases. This caused the optimizer to ignore valid EITS
data and fall back to engine statistics.

Additionally, KEY::actual_rec_per_key() would fall back to engine
statistics even when EITS was available, and used incorrect pointer
comparison (rec_per_key == 0 instead of nullptr).

Fix by adding Index_statistics::stats_were_read flag to track per-index
whether statistics were actually read from persistent tables, and
restructuring actual_rec_per_key() to prioritize EITS when available.
…olumns

When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.

For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics
confirm all indexed values are NULL, the optimizer can now return a
very low cardinality estimate (1.0) instead of unknown (0.0), since
NULL = NULL never matches.

For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col),
normal cardinality estimation continues to apply since matches are possible.

Changes:
- Added KEY::rec_per_key_null_aware() to check nulls_ratio from column
  statistics when avg_frequency is 0
- Modified best_access_path() in sql_select.cc to use the new
  rec_per_key_null_aware() method for ref access cost estimation
- The optimization works with single-column and composite indexes,
  checking each key part's NULL-rejecting status via notnull_part bitmap
@Olernov Olernov force-pushed the 11.4-MDEV-36761-all-nulls-v2 branch from bb5d8c9 to 3affe5f Compare October 2, 2025 16:15
@Olernov Olernov force-pushed the 11.4-MDEV-36761-all-nulls-v2 branch from f6dc6ed to 42b4826 Compare October 6, 2025 14:27
Copy link
Member

@DaveGosselin-MariaDB DaveGosselin-MariaDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve but let @spetrunia look before merging.

@spetrunia
Copy link
Member

double KEY::actual_rec_per_key(uint max_key_part) const
I think the name max_key_part here is misleading because the parameter can be any key part that one is interested in. It's not "max".

@spetrunia
Copy link
Member

  • condition (indicated by bit set in notnull_part) and the statistics
  • confirm all values are NULL (nulls_ratio == 1.0), we can return a very
  • low cardinality estimate (1.0) instead of 0.0 (unknown), indicating

1.0 is not "low cardinality". It assumes high cardinality, that all values are different.

@spetrunia
Copy link
Member

I think t he patch doesn't work for partially-covered columns. A testcase:

create table t1 (a varchar(10));
insert into t1 select seq from seq_1_to_10;

create table t2 (
  a varchar(10), 
  b varchar(10),
  index i1(a,b(5))
);
insert into t2 select seq, NULL from seq_1_to_1000;
analyze table t2 persistent for columns (b) indexes (i1);

explain select * from t1, t2 where t2.a=t1.a and t2.b=t1.a;

This is because key_part[bit].field is a special field object representing the "Field as key part" and it has no stats set.
Need to use table->field[ key_part[bit].field->field_index ( also -1 here? or was that fixed?) ]

@mariadb-OlegSmirnov
Copy link
Contributor

mariadb-OlegSmirnov commented Oct 8, 2025

double KEY::actual_rec_per_key(uint max_key_part) const
I think the name max_key_part here is misleading because the parameter can be any key part that one is interested in. It's not "max".

I agree, actually it is the index of the last key part in the prefix (0-based). What do you think can be a better name? last_key_part_in_prefix maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants