Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: fix panic tidb-server doesn't release table lock #19586

Merged
merged 24 commits into from
Sep 15, 2020

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Aug 28, 2020

What problem does this PR solve?

Close #19240

When tidb-server been killed -9(such as oom that killed by the system), the tidb-server won't release the held table locks, and this dead table lock will never be released unless user use admin cleanup table lock statement.

What is changed and how it works?

Add a background goroutine to periodic inspection the dead lock.

If the lock is held by the server that doesn't exist meta key (such as DDLAllSchemaVersions meta key) in ETCD, we think the server is dead, so we can clean the locks that hold by the dead tidb-server.

Why use DDLAllSchemaVersions meta key instead of ServerInformationPath meta key?

  1. Table lock implementation with DDL, and DDLAllSchemaVersions meta key is a related DDL meta key.
  2. DDLAllSchemaVersions TTL is 90s, but ServerInformationPath TTL is 10 min. The TTL90s is much better for usage.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • fix panic tidb-server doesn't release table lock

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520 crazycs520 added the sig/sql-infra SIG: SQL Infra label Aug 28, 2020
@wjhuang2016
Copy link
Member

/run-all-tests

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some tests?

ddl/ddl.go Show resolved Hide resolved
ddl/ddl.go Outdated Show resolved Hide resolved
ddl/util/dead_table_lock_checker.go Outdated Show resolved Hide resolved
ddl/ddl.go Outdated Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520
Copy link
Contributor Author

Could we add some tests?

I try to add some unit tests, but it is too much trouble to start a mock store with a real ETCD cluster.

Any suggestion? or just add some failpoint to mock?

ddl/util/dead_table_lock_checker.go Outdated Show resolved Hide resolved
ddl/ddl.go Outdated Show resolved Hide resolved
ddl/ddl.go Show resolved Hide resolved
crazycs520 and others added 5 commits September 9, 2020 19:52
Co-authored-by: Lynn <zimu_xia@126.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@tangenta tangenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 14, 2020
Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala zimulala added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 15, 2020
@crazycs520
Copy link
Contributor Author

/run-all-tests

1 similar comment
@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-unit-test

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-integration-common-test

@crazycs520 crazycs520 merged commit 8ae5b1c into pingcap:master Sep 15, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 16, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #20020

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 16, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-3.0 in PR #20021

ti-srebot added a commit that referenced this pull request Sep 21, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiDB does not clean table lock if restart on OOM
5 participants