Open
Description
Development Task
When TiDB bootstrap, it reads the stats data of all tables:
tidb/statistics/handle/bootstrap.go
Line 300 in a0c7407
When there're lots of tables in a TiDB cluster, caching all the stats data into a single TiDB server may cause a high memory consumption when the TiDB server bootstrapped. It increases the OOM risk of the TiDB server.
Here are things we need to do:
- Add a benchmark test to see the exact memory consumption for 1K, 2K, 4K, 8K, 16K tables.
- Consider some strategies to reduce memory consumption. For example:
- Don't gather CM-Sketch for a unique or primary index, by default, it can save 40 KB memory.
- Optimize the data structure in memory to store CM Sketch and Histograms.
- Not all the tables are queried at the same time, we may not need to load stats data for all tables at the bootstrap time. We may further consider some stats cache replacement algorithm to drop old unused stats data and load new requested stats data into the stats cache.
- We may also introduce a two-layer cache, which is: memory <- local disk <- TiKV cluster. It's more complicated than the first idea, we can discuss it in the future.
- We also need a visible way to see the memory consumption of stats data cache. This topic can be expanded to other components. Such as global variable cache, global binding cache, etc.
See also #17200
Activity