Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplerate could be dynamically adjusted or make the num samples also usable. #46485

Open
jackysp opened this issue Aug 30, 2023 · 2 comments
Open
Assignees
Labels
component/statistics sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@jackysp
Copy link
Member

jackysp commented Aug 30, 2023

Enhancement

After #37193 and #35232, the analyze statement has used max uint64 to retrieve data, thus avoiding blocking GC. However, this change has also caused some issues. The calculation of samplerate is based on the start of analyze. If the application keeps writing data during the analyze process, the amount of data in the table may increase significantly. As a result, the initial samplerate becomes too large, leading to excessive sampling and causing memory/CPU pressure in TiDB.

It is best to have a feedback mechanism that can dynamically adjust the default samplerate as the data volume increases, or make the "num samples" option available. Currently, it is not recommended for use according to the documentation.

@jackysp jackysp added type/enhancement The issue or PR belongs to an enhancement. component/statistics labels Aug 30, 2023
@jackysp
Copy link
Member Author

jackysp commented Aug 30, 2023

PTAL @chrysan, cc @pingandb

@qw4990 qw4990 added the sig/planner SIG: Planner label Aug 30, 2023
@qw4990
Copy link
Contributor

qw4990 commented Aug 30, 2023

A simple solution is to record a limitation based on the sample rate at the beginning, and once the number of sample rows exceeds the limitation, we stop the table scan on TiKV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

4 participants