Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ease of use: The stats_meta data table adds two fields related to statistics collection. #49594

Open
wentaojin opened this issue Dec 19, 2023 · 5 comments
Assignees
Labels
component/statistics sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@wentaojin
Copy link

Feature Request

Is your feature request related to a problem? Please describe:
Ease of use: The stats_meta data table adds two fields related to statistics collection. One field represents the last table analyze statistics collection time, and the other field represents the source of the last table analyze statistics (internal automatic collection or external manual collection).

Describe the feature you'd like:
Ease of use: The stats_meta data table adds two fields related to statistics collection. One field represents the last table analyze statistics collection time, and the other field represents the source of the last table analyze statistics (internal automatic collection or external manual collection).

Describe alternatives you've considered:
NO

Teachability, Documentation, Adoption, Migration Strategy:
Scenes:

  1. Used to distinguish and identify external statistical information collection scripts
  2. Check the recent collection time and collection method of the differentiation table statistics.
@wentaojin wentaojin added the type/feature-request Categorizes issue or PR as related to a new feature. label Dec 19, 2023
@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner component/statistics and removed type/feature-request Categorizes issue or PR as related to a new feature. labels Dec 20, 2023
@Rustin170506
Copy link
Member

You can get this information from the mysql.analyze_jobs.

For instance:

+---------+---------------------+--------------+--------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+--------------------------------------------------+-------------------------------------------------------------------------+------------+
| 2670123 | 2023-12-27 10:38:34 | co01db33     | co01table86  |                | auto analyze table all columns with 256 buckets, 500 topn, 1 samplerate                                                                                                                                                                                                                                                                                                          |         150000 | 2023-12-27 10:38:31 | 2023-12-27 10:38:34 | finished | NULL                                             | tc-tidb-0.tc-tidb-peer.partition-analyze-test-tps-3180202-1-85.svc:4000 |       NULL |
| 2670124 | 2023-12-27 10:49:00 | co01db33     | co01table125 |                | auto analyze table all columns with 256 buckets, 500 topn, 1 samplerate                                                                                                                                                                                                                                                                                                          |              0 | 2023-12-27 10:38:34 | NULL                | failed   | TiDB Server is down when running the analyze job | tc-tidb-0.tc-tidb-peer.partition-analyze-test-tps-3180202-1-85.svc:4000 |       NULL |
+---------+---------------------+--------------+--------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+--------------------------------------------------+-------------------------------------------------------------------------+------------+

@Rustin170506
Copy link
Member

The table definition is:

CREATE TABLE IF NOT EXISTS mysql.analyze_jobs (
		id BIGINT(64) UNSIGNED NOT NULL AUTO_INCREMENT,
		update_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
		table_schema CHAR(64) NOT NULL DEFAULT '',
		table_name CHAR(64) NOT NULL DEFAULT '',
		partition_name CHAR(64) NOT NULL DEFAULT '',
		job_info TEXT NOT NULL,
		processed_rows BIGINT(64) UNSIGNED NOT NULL DEFAULT 0,
		start_time TIMESTAMP,
		end_time TIMESTAMP,
		state ENUM('pending', 'running', 'finished', 'failed') NOT NULL,
		fail_reason TEXT,
		instance VARCHAR(512) NOT NULL comment 'address of the TiDB instance executing the analyze job',
		process_id BIGINT(64) UNSIGNED comment 'ID of the process executing the analyze job',
		PRIMARY KEY (id),
		KEY (update_time)

If the job_info starts with auto analyze, then it is triggered by TiDB itself.

@wentaojin
Copy link
Author

@hi-rustin the mysql.analyze_jobs table ease of use is poor,a final intuitive information is needed, not a process state. The table is used to collect historical records in increments. The second one may require manual sorting to remove duplicates and find the maximum time, because a certain table will be recorded after multiple collections. The third One has to like to judge autoanalyze

@Rustin170506
Copy link
Member

I acknowledge that it has some usability issues. However, recording only the last analysis time is also quite unreasonable. If the purpose of this requirement is to audit and analyze historical data, then analyze_jobs is the best source. stats meta does not record the time of the last analysis because analysis can either succeed or fail. Do we really need to store this information again in stats meta? In that case, stats meta would essentially become the same as the analyze_jobs table.

@wentaojin
Copy link
Author

wentaojin commented Dec 29, 2023

@hi-rustin

  1. The requirement is to quickly identify the last statistical information collection time and the method of the last collection, not to analyze the audit history.
  2. It doesn’t matter whether the collection fails or not. Even if it fails, it will still be displayed as before. As mentioned above, you need to visualize the final state data. Of course, you can also use analyze_jobs according to your example above, but it requires a lot of preprocessing, such as : Sort to find the most recent collection time of a table, and then filter whether to auto collect
  3. As for whether to add it to the stats_meta table or the tables table, it can be displayed intuitively and it will be better to get it immediately.
  4. The analyze_jobs table data retention time is limited. In extreme cases, the collection history of a certain table may be flushed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants