Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: concurrency setting when write to tikv #51380

Open
D3Hunter opened this issue Feb 28, 2024 · 1 comment
Open

lightning: concurrency setting when write to tikv #51380

D3Hunter opened this issue Feb 28, 2024 · 1 comment
Labels
component/ddl This issue is related to DDL of TiDB. component/lightning This issue is related to Lightning of TiDB. type/enhancement The issue or PR belongs to an enhancement.

Comments

@D3Hunter
Copy link
Contributor

D3Hunter commented Feb 28, 2024

Enhancement

before 7.0.0, we can use range-concurrency to control concurrency(multiple by 2 actually) when write to tikv, but after 7.0.0, the write concurrency is table-concurrency * range-concurrency * 2 as we start range-concurrency * 2 workers for each engine, if we enlarge table-concurrency as before, the tikv write concurrency might be too large and overload tikv(io), it might increase io latency -> stuck raftstore -> leader change frequently -> cause region split fail even after retry(keeps reporting not leader or leader is none error)

for i := 0; i < local.WorkerConcurrency; i++ {
workGroup.Go(func() error {
return local.startWorker(workerCtx, jobToWorkerCh, jobFromWorkerCh, &jobWg)
})
}

error="batch split regions failed: split region failed: err=message:\"peer is not leader for region 751226, leader may None\" not_leader:<region_id:751226 > : [BR:Restore:ErrRestoreSplitFailed]fail to split region; split region failed: err=message:\"peer is not leader for region 751226, leader may None\" not_leader:<region_id:751226 > : [BR:Restore:ErrRestoreSplitFailed]fail to split region; split region failed: err=message:\"peer is not leader for region 751226, leader may None\" not_leader:<region_id:751226 > : [BR:Restore:ErrRestoreSplitFailed]fail to split region; split region failed: err=message:\"peer is not leader for region 751226, leader may None\" not_leader:<region_id:751226 > : [BR:Restore:ErrRestoreSplitFailed]fail to split region"] [errorVerbose="the following errors occurred:
-  [BR:Restore:ErrRestoreSplitFailed]fail to split region
split region failed: err=message:\"peer is not leader for region 751226, leader may None\" not_leader:<region_id:751226 > 
github.com/pingcap/tidb/br/pkg/restore/split.sendSplitRegionRequest
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/split/client.go:370
github.com/pingcap/tidb/br/pkg/restore/split.(*pdClient).sendSplitRegionRequest
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/split/client.go:315
github.com/pingcap/tidb/br/pkg/restore/split.(*pdClient).BatchSplitRegionsWithOrigin
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/split/client.go:413
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).BatchSplitRegions
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:400
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).SplitAndScatterRegionByRanges.func3
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:276
golang.org/x/sync/errgroup.(*Group).Go.func1
    /go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1650

@D3Hunter D3Hunter added the type/enhancement The issue or PR belongs to an enhancement. label Feb 28, 2024
@D3Hunter D3Hunter changed the title concurrency when write to tikv lightning: concurrency setting when write to tikv Feb 28, 2024
@D3Hunter D3Hunter added the component/lightning This issue is related to Lightning of TiDB. label Feb 28, 2024
@D3Hunter
Copy link
Contributor Author

we can workaround this by make range-concurrency smaller

@D3Hunter D3Hunter added the component/ddl This issue is related to DDL of TiDB. label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/ddl This issue is related to DDL of TiDB. component/lightning This issue is related to Lightning of TiDB. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant