Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics after Modify Column is not updated. #44311

Open
mjonss opened this issue May 31, 2023 · 0 comments
Open

Statistics after Modify Column is not updated. #44311

mjonss opened this issue May 31, 2023 · 0 comments
Assignees
Labels
component/statistics severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@mjonss
Copy link
Contributor

mjonss commented May 31, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

When modifying a column it does not update the statistics (here enum->varchar example)

1. Minimal reproduce step (Required)

drop table if exists t;
create table t (a int unsigned auto_increment primary key, b enum('a','b','c','d','e','f'));
insert into t (b) values ('a'),('b'),('c'),('d'),('e'),('f'),('a'),('d'),('f'),('f');
insert into t (b) select t.b from t,t t1,t t2,t t3;
select sleep(5);
analyze table t;
show stats_healthy;
select count(*) from t where b = 'f'; 
show stats_histograms where table_name = 't';
show stats_topn where table_name = 't';

alter table t modify column b varchar(25);
show stats_healthy;

select sleep(5);
select count(*) from t where b = 'f'; 
show stats_histograms where table_name = 't';
show stats_topn where table_name = 't';

analyze table t;
show stats_healthy;
select count(*) from t where b = 'f'; 
show stats_histograms where table_name = 't';
show stats_topn where table_name = 't';

2. What did you expect to see? (Required)

The histogram and TopN statistics reflecting the new column type.

3. What did you see instead (Required)

tidb> drop table if exists t;
Query OK, 0 rows affected, 1 warning (0.00 sec)

tidb> create table t (a int unsigned auto_increment primary key, b enum('a','b','c','d','e','f'));
Query OK, 0 rows affected (0.09 sec)

tidb> insert into t (b) values ('a'),('b'),('c'),('d'),('e'),('f'),('a'),('d'),('f'),('f');
Query OK, 10 rows affected (0.02 sec)
Records: 10  Duplicates: 0  Warnings: 0

tidb> insert into t (b) select t.b from t,t t1,t t2,t t3;
Query OK, 10000 rows affected (0.11 sec)
Records: 10000  Duplicates: 0  Warnings: 0

tidb> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.01 sec)

tidb> analyze table t;
Query OK, 0 rows affected, 1 warning (0.14 sec)

tidb> show stats_healthy;
+---------+------------+----------------+---------+
| Db_name | Table_name | Partition_name | Healthy |
+---------+------------+----------------+---------+
| test    | t          |                |     100 |
+---------+------------+----------------+---------+
1 row in set (0.00 sec)

tidb> select count(*) from t where b = 'f'; 
+----------+
| count(*) |
+----------+
|     3003 |
+----------+
1 row in set (0.00 sec)

tidb> show stats_histograms where table_name = 't';
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time         | Distinct_count | Null_count | Avg_col_size | Correlation         | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| test    | t          |                | a           |        0 | 2023-05-31 14:33:12 |           9990 |          0 |            8 |                   1 | allEvicted  |               0 |              0 |              0 |             0 |
| test    | t          |                | b           |        0 | 2023-05-31 14:33:12 |              6 |          0 |            8 | 0.13510223196468119 | allLoaded   |             278 |              0 |            278 |             0 |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
2 rows in set (0.00 sec)

tidb> show stats_topn where table_name = 't';
+---------+------------+----------------+-------------+----------+-------+-------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Value | Count |
+---------+------------+----------------+-------------+----------+-------+-------+
| test    | t          |                | b           |        0 | 1     |  2002 |
| test    | t          |                | b           |        0 | 2     |  1001 |
| test    | t          |                | b           |        0 | 3     |  1001 |
| test    | t          |                | b           |        0 | 4     |  2002 |
| test    | t          |                | b           |        0 | 5     |  1001 |
| test    | t          |                | b           |        0 | 6     |  3003 |
+---------+------------+----------------+-------------+----------+-------+-------+
6 rows in set (0.00 sec)

tidb> 
tidb> alter table t modify column b varchar(25);
Query OK, 0 rows affected (1.22 sec)

--
-- Here the statistics look healthy
--
tidb> show stats_healthy;
+---------+------------+----------------+---------+
| Db_name | Table_name | Partition_name | Healthy |
+---------+------------+----------------+---------+
| test    | t          |                |     100 |
+---------+------------+----------------+---------+
1 row in set (0.00 sec)

tidb> 
tidb> select sleep(5);
+----------+
| sleep(5) |
+----------+
|        0 |
+----------+
1 row in set (5.00 sec)

tidb> select count(*) from t where b = 'f'; 
+----------+
| count(*) |
+----------+
|     3003 |
+----------+
1 row in set (0.04 sec)

tidb> show stats_histograms where table_name = 't';
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time         | Distinct_count | Null_count | Avg_col_size | Correlation         | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| test    | t          |                | a           |        0 | 2023-05-31 14:33:12 |           9990 |          0 |            8 |                   1 | allEvicted  |               0 |              0 |              0 |             0 |
| test    | t          |                | b           |        0 | 2023-05-31 14:33:12 |              6 |          0 |            8 | 0.13510223196468119 | allLoaded   |             278 |              0 |            278 |             0 |
| test    | t          |                | b           |        0 | 2023-05-31 14:33:13 |              0 |      10010 |            0 |                   0 | allLoaded   |               0 |              0 |              0 |             0 |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
3 rows in set (0.00 sec)

--
-- Here it is still the old enum values instead of the 'a'-'f' varchar's!
--
tidb> show stats_topn where table_name = 't';
+---------+------------+----------------+-------------+----------+-------+-------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Value | Count |
+---------+------------+----------------+-------------+----------+-------+-------+
| test    | t          |                | b           |        0 | 1     |  2002 |
| test    | t          |                | b           |        0 | 2     |  1001 |
| test    | t          |                | b           |        0 | 3     |  1001 |
| test    | t          |                | b           |        0 | 4     |  2002 |
| test    | t          |                | b           |        0 | 5     |  1001 |
| test    | t          |                | b           |        0 | 6     |  3003 |
+---------+------------+----------------+-------------+----------+-------+-------+
6 rows in set (0.00 sec)

tidb> 
tidb> analyze table t;
Query OK, 0 rows affected, 1 warning (0.19 sec)

tidb> show stats_healthy;
+---------+------------+----------------+---------+
| Db_name | Table_name | Partition_name | Healthy |
+---------+------------+----------------+---------+
| test    | t          |                |     100 |
+---------+------------+----------------+---------+
1 row in set (0.00 sec)

tidb> select count(*) from t where b = 'f'; 
+----------+
| count(*) |
+----------+
|     3003 |
+----------+
1 row in set (0.01 sec)

tidb> show stats_histograms where table_name = 't';
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time         | Distinct_count | Null_count | Avg_col_size | Correlation         | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
| test    | t          |                | a           |        0 | 2023-05-31 14:33:18 |           9990 |          0 |            8 |                   1 | allEvicted  |               0 |              0 |              0 |             0 |
| test    | t          |                | b           |        0 | 2023-05-31 14:33:12 |              6 |          0 |            8 | 0.13510223196468119 | allLoaded   |             278 |              0 |            278 |             0 |
| test    | t          |                | b           |        0 | 2023-05-31 14:33:18 |              6 |          0 |            2 | 0.13510223196468119 | allLoaded   |             284 |              0 |            284 |             0 |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+---------------------+-------------+-----------------+----------------+----------------+---------------+
3 rows in set (0.00 sec)

--
-- And here after a manual ANALYZE TABLE, the old enum values are still there!
--
tidb> show stats_topn where table_name = 't';
+---------+------------+----------------+-------------+----------+-------+-------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Value | Count |
+---------+------------+----------------+-------------+----------+-------+-------+
| test    | t          |                | b           |        0 | 1     |  2002 |
| test    | t          |                | b           |        0 | 2     |  1001 |
| test    | t          |                | b           |        0 | 3     |  1001 |
| test    | t          |                | b           |        0 | 4     |  2002 |
| test    | t          |                | b           |        0 | 5     |  1001 |
| test    | t          |                | b           |        0 | 6     |  3003 |
| test    | t          |                | b           |        0 | a     |  2002 |
| test    | t          |                | b           |        0 | b     |  1001 |
| test    | t          |                | b           |        0 | c     |  1001 |
| test    | t          |                | b           |        0 | d     |  2002 |
| test    | t          |                | b           |        0 | e     |  1001 |
| test    | t          |                | b           |        0 | f     |  3003 |
+---------+------------+----------------+-------------+----------+-------+-------+
12 rows in set (0.00 sec)

No updated statistics. And after manual analyze table t the old statistics is still there!

Also if there would be an index on 'b' it seems like the statistics would been updated after the alter table.

4. What is your TiDB version? (Required)

tidb_version(): Release Version: v7.2.0-alpha
Edition: Community
Git Commit Hash: 65d5ef2eaa34c17e73928d17be339b8dbdcddc11
Git Branch: heads/refs/tags/v7.2.0-alpha
UTC Build Time: 2023-05-31 02:32:36
GoVersion: go1.20.3
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv
@mjonss mjonss added the type/bug The issue is confirmed as a bug. label May 31, 2023
@jebter jebter added the sig/planner SIG: Planner label Jul 27, 2023
@hawkingrei hawkingrei self-assigned this Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants