schedule: improve the leader distribution after region scatter #2659

nolouch · 2020-07-16T08:29:45Z

Signed-off-by: nolouch nolouch@gmail.com

What problem does this PR solve?

Fix #2655

What is changed and how it works?

add leader store counter
picker the minimal store

Check List

Tests

Unit test
Manual test (add detailed scripts or steps below)
the new result with this PR.

MySQL [test]> select count(s.region_id) cnt, s.index_name, p.store_id from INFORMATION_SCHEMA.TIKV_REGION_STATUS s join INFORMATION_SCHEMA.tikv_region_peers p on s.region_id = p.region_id where s.table_name = 'ss' group by index_name, p.store_id order by index_name,cnt desc;
+-----+------------+----------+
| cnt | index_name | store_id |
+-----+------------+----------+
| 172 | NULL       |        6 |
| 172 | NULL       |       11 |
| 172 | NULL       |        1 |
| 170 | NULL       |       93 |
| 170 | NULL       |  3670905 |
| 170 | NULL       |       10 |
| 170 | NULL       |        4 |
| 170 | NULL       |        5 |
| 170 | NULL       |        8 |
|   1 | idx1       |       11 |
|   1 | idx1       |        1 |
|   1 | idx1       |        6 |
+-----+------------+----------+
12 rows in set (0.43 sec)

MySQL [test]> select count(s.region_id) cnt, s.index_name, p.store_id from INFORMATION_SCHEMA.TIKV_REGION_STATUS s join INFORMATION_SCHEMA.tikv_region_peers p on s.region_id = p.region_id where s.table_name = 'ss' and p.is_leader = 1 group by index_name, p.store_id order by index_name,cnt desc;
+-----+------------+----------+
| cnt | index_name | store_id |
+-----+------------+----------+
|  59 | NULL       |        1 |
|  58 | NULL       |        6 |
|  57 | NULL       |       93 |
|  57 | NULL       |       10 |
|  57 | NULL       |        8 |
|  57 | NULL       |        4 |
|  56 | NULL       |        5 |
|  56 | NULL       |  3670905 |
|  55 | NULL       |       11 |
|   1 | idx1       |        6 |
+-----+------------+----------+
10 rows in set (0.47 sec)

Release note

improve the leader distribution after region scatter

Signed-off-by: nolouch <nolouch@gmail.com>

Yisaer · 2020-07-17T09:53:13Z

server/schedule/region_scatterer.go

@@ -153,6 +179,8 @@ func (r *RegionScatterer) scatterRegion(region *core.RegionInfo) *operator.Opera
 	}

 	scatterWithSameEngine(ordinaryPeers, r.ordinaryEngine)
+	// FIXME: target leader only consider the ordinary engine.


What about creating an issue to track this (Ignore me if already).

I add more comments.

server/schedule/region_scatterer.go

Yisaer · 2020-07-17T10:24:52Z

server/schedule/region_scatterer.go

@@ -153,6 +179,8 @@ func (r *RegionScatterer) scatterRegion(region *core.RegionInfo) *operator.Opera
 	}

 	scatterWithSameEngine(ordinaryPeers, r.ordinaryEngine)
+	// FIXME: target leader only consider the ordinary engine.
+	targetLeader := r.collectAvailableLeaderStores(targetPeers, r.ordinaryEngine)


Will the leader store be collected again after collectAvailableLeaderStores in scatterWithSameEngine function?

now, only consider the dorinaryEngine.

Yisaer · 2020-07-17T10:27:39Z

server/schedule/region_scatterer.go

+func (s *selectedLeaderStores) put(id uint64) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	s.stores[id] = s.stores[id] + 1
+}


If the leader is transferred manually, will the old store's count minus 1 and new store's count plus 1?

Yes, you are right. This scheduler assumes that the leader and region will not change significantly after scheduling. Maybe we need to discuss optimization in another issue.

I think we can directly to make a mechanism (syncer or tracker) to record the correct leader count distribution.

In fact, it will be more complicated to do this, such as truncate table we need to remove the regions from the tracker, and what if recover a table? The current approach is at least feasible in general, there will be no more operator after cluster in balanced, especially in big cluster.

But the tracker you mentioned is a good idea, and I even want to use it to report why there is this operator produce.
cc @Yisaer

lhy1024

LGTM

server/schedule/region_scatterer_test.go

Signed-off-by: nolouch <nolouch@gmail.com>

rleungx

LGTM

nolouch · 2020-07-28T03:08:31Z

/merge

ti-srebot · 2020-07-28T03:09:00Z

/run-all-tests

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

ti-srebot · 2020-07-28T03:21:26Z

cherry pick to release-4.0 in PR #2684

…2659) Signed-off-by: nolouch <nolouch@gmail.com>

#2684) Signed-off-by: nolouch <nolouch@gmail.com>

schedule: improve the leader distribution after region scatter

2ba9ac5

Signed-off-by: nolouch <nolouch@gmail.com>

nolouch added component/schedule Scheduling logic. status/DNM labels Jul 16, 2020

add filter

ef23812

Signed-off-by: nolouch <nolouch@gmail.com>

nolouch removed the status/DNM label Jul 16, 2020

nolouch requested review from lhy1024 and disksing July 16, 2020 13:42

Yisaer self-requested a review July 17, 2020 07:22

Yisaer reviewed Jul 17, 2020

View reviewed changes

lhy1024 reviewed Jul 20, 2020

View reviewed changes

ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 20, 2020

rleungx reviewed Jul 21, 2020

View reviewed changes

server/schedule/region_scatterer_test.go Outdated Show resolved Hide resolved

nolouch added 2 commits July 23, 2020 16:06

address comments

f6d995b

Signed-off-by: nolouch <nolouch@gmail.com>

Merge branch 'master' into fix-scatter-leader

e82823f

nolouch added the needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. label Jul 24, 2020

rleungx approved these changes Jul 28, 2020

View reviewed changes

ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Jul 28, 2020

ti-srebot approved these changes Jul 28, 2020

View reviewed changes

ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Jul 28, 2020

ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 28, 2020

ti-srebot merged commit b1f967b into tikv:master Jul 28, 2020

ti-srebot pushed a commit to ti-srebot/pd that referenced this pull request Jul 28, 2020

cherry pick tikv#2659 to release-4.0

e03be1e

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

ti-srebot mentioned this pull request Jul 28, 2020

schedule: improve the leader distribution after region scatter (#2659) #2684

Merged

nolouch deleted the fix-scatter-leader branch July 28, 2020 03:22

nolouch added a commit to ti-srebot/pd that referenced this pull request Aug 3, 2020

schedule: improve the leader distribution after region scatter (tikv#…

341454e

…2659) Signed-off-by: nolouch <nolouch@gmail.com>

ti-srebot added a commit that referenced this pull request Aug 3, 2020

schedule: improve the leader distribution after region scatter (#2659) (

847543f

#2684) Signed-off-by: nolouch <nolouch@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedule: improve the leader distribution after region scatter #2659

schedule: improve the leader distribution after region scatter #2659

nolouch commented Jul 16, 2020 •

edited

Loading

Yisaer Jul 17, 2020

nolouch Jul 23, 2020

Yisaer Jul 17, 2020

nolouch Jul 23, 2020

Yisaer Jul 17, 2020

nolouch Jul 23, 2020

Yisaer Jul 27, 2020 •

edited

Loading

nolouch Jul 27, 2020

nolouch Jul 27, 2020

lhy1024 left a comment

rleungx left a comment

nolouch commented Jul 28, 2020

ti-srebot commented Jul 28, 2020

ti-srebot commented Jul 28, 2020

schedule: improve the leader distribution after region scatter #2659

schedule: improve the leader distribution after region scatter #2659

Conversation

nolouch commented Jul 16, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yisaer Jul 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhy1024 left a comment

Choose a reason for hiding this comment

rleungx left a comment

Choose a reason for hiding this comment

nolouch commented Jul 28, 2020

ti-srebot commented Jul 28, 2020

ti-srebot commented Jul 28, 2020

nolouch commented Jul 16, 2020 •

edited

Loading

Yisaer Jul 27, 2020 •

edited

Loading