Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

relay: retry for upstream bad connection #265

Merged
merged 15 commits into from
Sep 3, 2019

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Sep 2, 2019

What problem does this PR solve?

  • retry to read binlog events from upstream when encountering bad connection error

What is changed and how it works?

  • check return error from binlog reader
  • use pkg/backoff to trigger the retry

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)
    • run DM-worker with fetching binlog events
    • KILL connection for the relay unit
    • observe whether relay retry to fetch binlog events again

Side effects

  • Increased code complexity

@csuzhangxc csuzhangxc added priority/important Major change, requires approval from ≥2 primary reviewers status/PTAL This PR is ready for review. Add this label back after committing new changes type/bug-fix Bug fix status/WIP This PR is still work in progress and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Sep 2, 2019
@codecov
Copy link

codecov bot commented Sep 2, 2019

Codecov Report

Merging #265 into master will decrease coverage by 0.2887%.
The diff coverage is 88.3116%.

@@               Coverage Diff                @@
##             master       #265        +/-   ##
================================================
- Coverage   60.2806%   59.9918%   -0.2888%     
================================================
  Files           134        134                
  Lines         14965      14742       -223     
================================================
- Hits           9021       8844       -177     
+ Misses         5099       5052        -47     
- Partials        845        846         +1

@csuzhangxc
Copy link
Member Author

/run-all-tests

1 similar comment
@csuzhangxc
Copy link
Member Author

/run-all-tests

@amyangfei amyangfei added the priority/release-blocker This PR blocks a release. Please review it ASAP. label Sep 3, 2019
@csuzhangxc csuzhangxc added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Sep 3, 2019
@csuzhangxc
Copy link
Member Author

@GregoryIan @amyangfei PTAL

relay/relay.go Outdated Show resolved Hide resolved
@@ -83,11 +83,7 @@ func (t *testParserSuite) TestParser(c *C) {
}

unsupportedSQLs := []string{
"alter table bar ADD FULLTEXT INDEX `fulltext` (`name`) WITH PARSER ngram",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove two supported cases handy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, because parser also updated when executing

GO111MODULE=on go get -u github.com/siddontang/go-mysql@f52d30c9fcb7824f4b6f38e3e4ed1037a1a05acd

return false
}

for lrt := rr.lastRetryTime; time.Since(lrt) > rr.cfg.BackoffRollback; lrt = lrt.Add(rr.cfg.BackoffRollback) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we this logic code into backoff pkg? but not now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a function like func (b *Backoff) Adjust() to adjust/rollback cwnd?
If so, maybe too many things backoff can do?
what your opinion? @amyangfei

Copy link
Contributor

@amyangfei amyangfei Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to add this logic into backoff pkg, but maybe we can add a function like RollbackN?

}

r.tctx.L().Info("receive retryable error for binlog reader", log.ShortError(err))
err = reader2.Close() // close the previous reader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we close reader2 twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, we can close Reader any times, but it will return a can not close error except the first one.

In this place, we only close a reader2 instance once. this L298 close the previous one, and in the defer in L260 ~ L265, it may close the first one (without retry) or the latest one (with retry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check reader2 != nil in c6baaab.

@IANTHEREAL
Copy link
Collaborator

Rest LGTM

@IANTHEREAL
Copy link
Collaborator

LGTM

@@ -96,6 +97,13 @@ func NewRealRelayHolder(cfg *Config) RelayHolder {
},
BinLogName: clone.RelayBinLogName,
BinlogGTID: clone.RelayBinlogGTID,
ReaderRetry: rr.ReaderRetryConfig{ // we use config for TaskChecker now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ReaderRetry: rr.ReaderRetryConfig{ // we use config for TaskChecker now
ReaderRetry: rr.ReaderRetryConfig{ // we use config from TaskChecker now

@@ -320,6 +328,13 @@ func (h *realRelayHolder) Update(ctx context.Context, cfg *Config) error {
User: cfg.From.User,
Password: cfg.From.Password,
},
ReaderRetry: rr.ReaderRetryConfig{ // we use config for TaskChecker now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amyangfei amyangfei added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Sep 3, 2019
@csuzhangxc csuzhangxc merged commit 14eae68 into pingcap:master Sep 3, 2019
@csuzhangxc csuzhangxc deleted the relay-retry branch September 3, 2019 07:45
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/important Major change, requires approval from ≥2 primary reviewers priority/release-blocker This PR blocks a release. Please review it ASAP. status/LGT2 Two reviewers already commented LGTM, ready for merge type/bug-fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants