PD can't redistribute the hot write regions among TiFlash nodes #1235

JaySon-Huang · 2020-11-17T09:48:18Z

Version v4.0.8.

In one of our customer production-environment:

there are 6 TiFlash nodes
each tiflash-replicated table has 2 replicas
each TiFlash node is deployed on 2 SSD disks

But when TiFlash get busy writing, only 2 TiFlash nodes get busy (the IO util get 100%) while others are idle.

JaySon-Huang · 2020-11-20T07:25:38Z

Add a script to show the hot write regions in TiFlash store. https://github.com/pingcap/tidb-ansible/pull/1359/files
Actually, it turns out that two TiFlash stores with a similar number of hot write regions have different metrics of IO util. One is almost 60~90% while another is only about 10%.

The result of customer env for running store-hot-regions.py

solotzg · 2020-11-24T14:41:09Z

message StoreStats {
    uint64 store_id = 1;
    // Capacity for the store.
    uint64 capacity = 2;
    // Available size for the store.
    uint64 available = 3;
    // Total region count in this store.
    uint32 region_count = 4;
    // Current sending snapshot count.
    uint32 sending_snap_count = 5;
    // Current receiving snapshot count.
    uint32 receiving_snap_count = 6;
    // When the store is started (unix timestamp in seconds).
    uint32 start_time = 7;
    // How many region is applying snapshot.
    uint32 applying_snap_count = 8;
    // If the store is busy
    bool is_busy = 9;
    // Actually used space by db
    uint64 used_size = 10;
    // Bytes written for the store during this period.
    uint64 bytes_written = 11;
    // Keys written for the store during this period.
    uint64 keys_written = 12;
    // Bytes read for the store during this period.
    uint64 bytes_read = 13;
    // Keys read for the store during this period.
    uint64 keys_read = 14;
    // Actually reported time interval
    TimeInterval interval = 15;
    // Threads' CPU usages in the store
    repeated RecordPair cpu_usages = 16;
    // Threads' read disk I/O rates in the store
    repeated RecordPair read_io_rates = 17;
    // Threads' write disk I/O rates in the store
    repeated RecordPair write_io_rates = 18;
    // Operations' latencies in the store
    repeated RecordPair op_latencies = 19;
}

By now bytes_written, keys_written, bytes_read, keys_read have not been reported to pd.

JaySon-Huang · 2020-12-03T08:30:28Z

Does not like leader and followers in TiKV, all TiFlash peer is ready for reading. And TiDB chooses TiFlash peer by using round-robin. So reporting bytes_read, keys_read for hot read regions redistributing is not very meaningful for us.

Only consider reporting the right bytes_written and keys_written to PD.

JaySon-Huang · 2020-12-09T06:49:00Z

I deployed a cluster with 1 TiDB + 1 PD + 1 TiKV + 2 TiFlash base on version v4.0.8.
The TiFlash and its proxy branch are https://github.com/JaySon-Huang/tics/tree/store_stats_4.0 , https://github.com/JaySon-Huang/tikv/tree/store_stats_4.0 . These two branches fix the problem that the written bytes and written keys at the store level are not reported to PD.

By adding a sysbench workload on this cluster, I found that:

move-hot-write-region between TiFlash store rarely happen.
In the phase of running the "oltp_update_index" workload, the writes-pressure between two TiFlash nodes is imbalanced, one is about 10 times to another node. But PD still did not generate move-hot-write-region between TiFlash stores.

Another problem, maybe related or not:
I use the PD API: /pd/api/v1/hotspot/regions/write to check the stats of hot write regions. In the TiFlash node, the flow bytes by summing all regions is about 4 times to the TiFlash reported.

JaySon-Huang · 2020-12-09T07:42:42Z

Will continue the discussion in tikv/pd#3261

JaySon-Huang · 2021-01-29T13:04:05Z

A test document from PD: https://docs.google.com/document/d/1FMR5bbQUyAI8tNrAyKBPlRaACaM0LdsTBiJFs0OP07E/edit#

JaySon-Huang · 2021-09-14T05:30:43Z

Done in tikv/pd#3261

JaySon-Huang added the type/enhancement The issue or PR belongs to an enhancement. label Nov 17, 2020

JaySon-Huang self-assigned this Dec 3, 2020

JaySon-Huang mentioned this issue Dec 9, 2020

Report store written bytes and written keys pingcap/tidb-engine-ext#18

Merged

This was referenced Dec 9, 2020

[DNM] Report store written bytes and written keys #1275

Closed

Keep the behavior of report 0 written bytes and keys to PD for release-4.0 branch pingcap/tidb-engine-ext#19

Merged

Keep the behavior of reporting 0 written bytes and keys to PD #1276

Merged

JaySon-Huang closed this as completed Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PD can't redistribute the hot write regions among TiFlash nodes #1235

PD can't redistribute the hot write regions among TiFlash nodes #1235

JaySon-Huang commented Nov 17, 2020 •

edited

Loading

JaySon-Huang commented Nov 20, 2020 •

edited

Loading

solotzg commented Nov 24, 2020

JaySon-Huang commented Dec 3, 2020 •

edited

Loading

JaySon-Huang commented Dec 9, 2020 •

edited

Loading

JaySon-Huang commented Dec 9, 2020

JaySon-Huang commented Jan 29, 2021

JaySon-Huang commented Sep 14, 2021

PD can't redistribute the hot write regions among TiFlash nodes #1235

PD can't redistribute the hot write regions among TiFlash nodes #1235

Comments

JaySon-Huang commented Nov 17, 2020 • edited Loading

JaySon-Huang commented Nov 20, 2020 • edited Loading

solotzg commented Nov 24, 2020

JaySon-Huang commented Dec 3, 2020 • edited Loading

JaySon-Huang commented Dec 9, 2020 • edited Loading

JaySon-Huang commented Dec 9, 2020

JaySon-Huang commented Jan 29, 2021

JaySon-Huang commented Sep 14, 2021

JaySon-Huang commented Nov 17, 2020 •

edited

Loading

JaySon-Huang commented Nov 20, 2020 •

edited

Loading

JaySon-Huang commented Dec 3, 2020 •

edited

Loading

JaySon-Huang commented Dec 9, 2020 •

edited

Loading