Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore DeltaMergeStore take too much time #6395

Closed
flowbehappy opened this issue Nov 30, 2022 · 2 comments · Fixed by #6420
Closed

Restore DeltaMergeStore take too much time #6395

flowbehappy opened this issue Nov 30, 2022 · 2 comments · Fixed by #6420
Assignees
Labels
affects-6.5 component/storage type/enhancement The issue or PR belongs to an enhancement.

Comments

@flowbehappy
Copy link
Contributor

flowbehappy commented Nov 30, 2022

Enhancement

A TiFlash node takes over 15 minutes to get ready for service. Because the restore process of DeltaMergeStore is very slow when there are too many ColumnFileTinys.

[2022/11/30 22:45:41.060 +08:00] [INFO] [DeltaMergeStore.cpp:209] ["Restore DeltaMerge Store start"] [source="table_id=11"] [thread_id=44]
...
[2022/11/30 23:03:32.419 +08:00] [INFO] [DeltaMergeStore.cpp:284] ["Restore DeltaMerge Store end, ps_run_mode=2"] [source="table_id=73"] [thread_id=44]
mysql> select SEGMENT_COUNT, AVG_PACK_COUNT_IN_DELTA, TOTAL_PACK_COUNT_IN_DELTA, TIFLASH_INSTANCE from information_schema.tiflash_tables where tidb_table = 'table_name';
+---------------+-------------------------+---------------------------+------------------+
| SEGMENT_COUNT | AVG_PACK_COUNT_IN_DELTA | TOTAL_PACK_COUNT_IN_DELTA | TIFLASH_INSTANCE |
+---------------+-------------------------+---------------------------+------------------+
|         18491 |      6.6471258752646145 |                    122460 | 172.16.5.81:7940 |
|         91006 |      5.6769096009753675 |                    479591 | 172.16.5.82:7940 |
+---------------+-------------------------+---------------------------+------------------+

image

@flowbehappy flowbehappy added type/enhancement The issue or PR belongs to an enhancement. component/storage labels Nov 30, 2022
@flowbehappy flowbehappy self-assigned this Nov 30, 2022
@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Dec 1, 2022

Restoring one table with about 360GB data
profile_2022-12-01_16-12-00.zip

59963129-acef-4f15-aa93-e8573d2e8ba3

@flowbehappy flowbehappy assigned hongyunyan and hehechen and unassigned hongyunyan Dec 2, 2022
@hehechen
Copy link
Contributor

hehechen commented Dec 5, 2022

DataTypePtr DataTypeFactory::get(const String & full_name) const
brings too much overhead when get column datatype. We can introduce a new function with datatype cache like
DataTypePtr DataTypeFactory::get(const String & family_name, const ASTPtr & parameters) const
.

We can use a unordered_map to store the cache for full_name -> DataTypePtr to avoid parsing full_name to AST repeatedly.

DataTypePtr DataTypeFactory::getOrSet(const String & full_name)
{
auto it = fullname_types.find(full_name);
if (it != fullname_types.end())
{
return it->second;
}
ParserIdentifierWithOptionalParameters parser;
ASTPtr ast = parseQuery(parser, full_name.data(), full_name.data() + full_name.size(), "data type", 0);
DataTypePtr datatype_ptr = get(ast);
// avoid big hashmap in rare cases.
if (fullname_types.size() < MAX_FULLNAME_TYPES)
{
fullname_types.emplace(full_name, datatype_ptr);
}
return datatype_ptr;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 component/storage type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants