Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index or lightning or import into failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader #8142

Closed
Lily2025 opened this issue May 7, 2024 · 4 comments · Fixed by #8216 or pingcap/tidb#53718
Assignees
Labels
affects-8.1 severity/major type/bug The issue is confirmed as a bug.

Comments

@Lily2025
Copy link

Lily2025 commented May 7, 2024

Bug Report

What did you do?

1、add index
2、kill pd leader

What did you expect to see?

add index can success

What did you see instead?

add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable”

What version of PD are you using (pd-server -V)?

./pd-server -V
Release Version: v8.2.0-alpha
Edition: Community
Git Commit Hash: 1679dbc
Git Branch: heads/refs/tags/v8.2.0-alpha
UTC Build Time: 2024-04-30 11:39:01

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label May 7, 2024
@Lily2025 Lily2025 changed the title add add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 7, 2024
@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

/type bug
/severity major
/assign rleungx

@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

/assign JmPotato

@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

lightning failed with error “request pd http api failed with status: '500 Internal Server Error'“ when kill pd leader

lightning logs:
[2024/05/06 15:42:15.072 +00:00] [ERROR] [client.go:234] ["[pd] request failed with a non-200 status"] [source=lightning] [name=GetStores] [url=http://tc-pd-2.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/stores] [method=GET] [caller-id=pd-http-client] [status="500 Internal Server Error"] [body="[PD:apiutil:ErrRedirect]redirect failed\n"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:234\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:139\ngithub.com/tikv/pd/client/retry.(*Backoffer).Exec\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/retry/backoff.go:78\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:160\ngithub.com/tikv/pd/client/http.(*client).request\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:379\ngithub.com/tikv/pd/client/http.(*client).GetStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/interface.go:403\ngithub.com/pingcap/tidb/pkg/lightning/tikv.ForAllStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/tikv/tikv.go:103\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*switcher).switchTiKVMode\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/tikv_mode.go:69\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*switcher).ToImportMode\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/tikv_mode.go:53\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*Controller).buildRunPeriodicActionAndCancelFunc.func5\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/import.go:1001"] [2024/05/06 15:42:16.034 +00:00] [ERROR] [client.go:206] ["[pd] do http request failed"] [source=lightning] [name=SetRegionLabelRule] [url=http://tc-pd-1.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/config/region-label/rule] [method=POST] [caller-id=pd-http-client] [error="Post \"http://tc-pd-1.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/config/region-label/rule\": dial tcp 10.200.72.140:2379: connect: connection refused"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:206\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:139\ngithub.com/tikv/pd/client/retry.(*Backoffer).Exec\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/retry/backoff.go:78\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:160\ngithub.com/tikv/pd/client/http.(*client).request\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/client.go:379\ngithub.com/tikv/pd/client/http.(*client).SetRegionLabelRule\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240430080403-1679dbca25b3/http/interface.go:691\ngithub.com/pingcap/tidb/br/pkg/pdutil.pauseSchedulerByKeyRangeWithTTL\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:702\ngithub.com/pingcap/tidb/br/pkg/pdutil.PauseSchedulersByKeyRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:670\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*Backend).ImportEngine\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/local.go:1291\ngithub.com/pingcap/tidb/pkg/lightning/backend.(*ClosedEngine).Import\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/backend.go:373\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importKV\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:1346\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importEngine\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:920\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importEngines.func3\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:526"] [2024/05/06 15:42:16.060 +00:00] [ERROR] [backend.go:378] ["import failed"]

@Lily2025 Lily2025 changed the title add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader add index or lightning failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 20, 2024
@Lily2025 Lily2025 changed the title add index or lightning failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader add index or lightning or import into failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 24, 2024
ti-chi-bot bot pushed a commit that referenced this issue May 27, 2024
…ctor (#8216)

close #8142

Add retry logic to improve PD HTTP request forwarding success rate during PD leader switch.

Signed-off-by: JmPotato <ghzpotato@gmail.com>
@JmPotato
Copy link
Member

JmPotato commented May 30, 2024

Introduced by #7896, multi-errors will return an error even if the request eventually succeeds after retries.

@JmPotato JmPotato reopened this May 30, 2024
ti-chi-bot bot added a commit that referenced this issue May 31, 2024
ref #8142

Due to the return of historical errors causing the client's retry logic to fail,
and since we currently do not need to obtain all errors during retries, this PR
removes `multierr` from backoffer and add tests to ensure the correctness of the retry logic.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue May 31, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue May 31, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 31, 2024
ref tikv#8142

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue Aug 1, 2024
…ctor (#8216) (#8466)

close #8142

Add retry logic to improve PD HTTP request forwarding success rate during PD leader switch.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: JmPotato <ghzpotato@gmail.com>
ti-chi-bot bot pushed a commit that referenced this issue Aug 6, 2024
ref #8142, close #8499

Due to the return of historical errors causing the client's retry logic to fail,
and since we currently do not need to obtain all errors during retries, this PR
removes `multierr` from backoffer and add tests to ensure the correctness of the retry logic.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: JmPotato <ghzpotato@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.1 severity/major type/bug The issue is confirmed as a bug.
Projects
3 participants