Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix bug that the kill command doesn't work when the killed session is waiting for the pessimistic lock #12852

Merged
merged 7 commits into from
Oct 24, 2019

Conversation

tiancaiamao
Copy link
Contributor

What problem does this PR solve?

mysql> create table t (id int primary key, c int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values (1, 1);
Query OK, 1 row affected (0.00 sec)

mysql> begin pessimistic;
Query OK, 0 rows affected (0.00 sec)


# In another session:

mysql> begin pessimistic;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t where id = 1 for update;
+----+------+
| id | c    |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)

# Change to the previous session
mysql> update t set c = c + 1 where id = 1;
// This query seems blocks forever, even I call `kill tidb xxx`

What is changed and how it works?

Fix bug that the kill command doesn't work when the killed session is waiting for the pessimistic lock.

The killed flag is checked when the code runs to executor.Next(), but when the transaction is acquiring the pessimistic lock, it blocks and never runs to executor.Next(), so the kill command can't work.

Check List

Tests

  • Unit test

Code changes

  • Has interface methods change

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Write release note for bug-fix or new feature.

@tiancaiamao
Copy link
Contributor Author

PTAL @coocood @jackysp @lysu @cfzjywxk

store/tikv/2pc.go Outdated Show resolved Hide resolved
ddl/index.go Outdated Show resolved Hide resolved
store/tikv/2pc.go Outdated Show resolved Hide resolved
@tiancaiamao
Copy link
Contributor Author

/run-all-tests

@codecov
Copy link

codecov bot commented Oct 21, 2019

Codecov Report

Merging #12852 into master will decrease coverage by 0.0383%.
The diff coverage is 85.7142%.

@@               Coverage Diff                @@
##             master     #12852        +/-   ##
================================================
- Coverage   80.0759%   80.0376%   -0.0384%     
================================================
  Files           465        465                
  Lines        107649     106721       -928     
================================================
- Hits          86201      85417       -784     
+ Misses        14967      14908        -59     
+ Partials       6481       6396        -85

tk.MustExec("insert test_kill values (1, 1)")
tk.MustExec("begin pessimistic")
tk2.MustExec("begin pessimistic")
tk.MustQuery("select * from test_kill where id = 1 for update")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a test case for MAX_EXECUTION_TIME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test it manually and currently MAX_EXECUTION_TIME works when the session is waiting for a lock.
Does MySQL have different behavior? @jackysp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL has the same behavior.

@lysu lysu requested review from lysu and cfzjywxk October 22, 2019 03:28
Comment on lines 721 to 723
if err1 := bo.BackoffWithMaxSleep(boTxnLockFast, int(expire), errors.New(locks[0].String())); err1 != nil {
return err1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to add this backoff here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm.... maybe I know: do you just want to check bo.ctx.Done()?

first batch is canceled by action.kill, then other batchs will use bo.ctx.Done(), batchExecutor will return until all batch be cancelled.. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... it seems doesn't work @tiancaiamao

if _, ok := batchExe.action.(actionPrewrite); ok {

this line, we just can cancel actionPrewrite, actionPessimisticLock also need cancel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call time.Sleep or bo.Backoff, I choose backoff because it may generate better error. @lysu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally, we think about TxnLockFast as read-conflict in metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to BoTxnLock @lysu

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu
Copy link
Contributor

lysu commented Oct 22, 2019

efddcf6#diff-4ef55fe1237c7e6bb2217ccca37b9f5fR439

@tiancaiamao do we need to make keyMayBeLocked aware ErrQueryInterrupted? for example we lock 1 key and wait for another, then query be killed, so first locked one better be rollback and didn't wait to lock ttl expired?

@coocood
Copy link
Member

coocood commented Oct 23, 2019

LGTM

@tiancaiamao
Copy link
Contributor Author

efddcf6#diff-4ef55fe1237c7e6bb2217ccca37b9f5fR439

@tiancaiamao do we need to make keyMayBeLocked aware ErrQueryInterrupted? for example we lock 1 key and wait for another, then query be killed, so first locked one better be rollback and didn't wait to lock ttl expired?

The problem out of the scope of this PR means to solve. @lysu

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu
Copy link
Contributor

lysu commented Oct 24, 2019

/run-all-tests

@tiancaiamao
Copy link
Contributor Author

/rebuild

[2019-10-24T09:14:00.292Z] WARNING: DATA RACE
[2019-10-24T09:14:00.292Z] Read at 0x00c0010862d0 by goroutine 100:
[2019-10-24T09:14:00.292Z]   github.com/pingcap/tidb/distsql.(*selectResult).Close()
[2019-10-24T09:14:00.292Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/distsql/select_result.go:290 +0x266
[2019-10-24T09:14:00.292Z]   github.com/pingcap/tidb/executor.closeAll()
[2019-10-24T09:14:00.292Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/distsql.go:118 +0x118
[2019-10-24T09:14:00.292Z]   github.com/pingcap/tidb/executor.(*TableReaderExecutor).Close()
[2019-10-24T09:14:00.292Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/table_reader.go:259 +0x3fd
[2019-10-24T09:14:00.292Z]   github.com/pingcap/tidb/executor.(*baseExecutor).Close()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/executor.go:113 +0xad
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*HashAggExec).Close()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/aggregate.go:218 +0x1f6
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*InsertExec).Close()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/insert.go:262 +0x151
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor.func1()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/adapter.go:524 +0x3e
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/adapter.go:530 +0x3b9
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/adapter.go:401 +0x1a2
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*ExecStmt).Exec()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/adapter.go:369 +0x34e
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/session.runStmt()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/tidb.go:240 +0x235
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/session.(*session).executeStatement()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:1016 +0x203
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/session.(*session).execute()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:1121 +0xae3
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/session.(*session).Execute()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:1056 +0xee
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/util/testkit.(*TestKit).Exec()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/util/testkit/testkit.go:144 +0x103
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/expression_test.(*testIntegrationSuite).TestAggregationBuiltinGroupConcat()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/expression/integration_test.go:3497 +0x4bb
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/expression_test.(*testIntegrationSuite).TestAggregationBuiltinGroupConcat()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/expression/integration_test.go:3487 +0x1ab
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/expression_test.(*testIntegrationSuite).TestAggregationBuiltinGroupConcat()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/expression/integration_test.go:3486 +0x174
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/expression_test.(*testIntegrationSuite2).TestAggregationBuiltinGroupConcat()
[2019-10-24T09:14:00.293Z]       <autogenerated>:1 +0x4d
[2019-10-24T09:14:00.293Z]   runtime.call32()
[2019-10-24T09:14:00.293Z]       /usr/local/go/src/runtime/asm_amd64.s:539 +0x3a
[2019-10-24T09:14:00.293Z]   reflect.Value.Call()
[2019-10-24T09:14:00.293Z]       /usr/local/go/src/reflect/value.go:321 +0xd3

[2019-10-24T09:14:00.293Z] 
[2019-10-24T09:14:00.293Z] Previous write at 0x00c0010862d0 by goroutine 240:
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/distsql.(*selectResult).getSelectResp()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/distsql/select_result.go:204 +0xada
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/distsql.(*selectResult).Next()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/distsql/select_result.go:151 +0x292
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*tableResultHandler).nextChunk()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/table_reader.go:237 +0xaa
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*TableReaderExecutor).Next()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/table_reader.go:156 +0x20c
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.Next()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/executor.go:203 +0x167
[2019-10-24T09:14:00.293Z]   github.com/pingcap/tidb/executor.(*HashAggExec).fetchChildData()
[2019-10-24T09:14:00.293Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/aggregate.go:549 +0x2ad

@tiancaiamao tiancaiamao merged commit 4907685 into pingcap:master Oct 24, 2019
@tiancaiamao tiancaiamao deleted the fix-kill-lock branch October 24, 2019 09:26
@sre-bot
Copy link
Contributor

sre-bot commented Oct 24, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Oct 24, 2019

cherry pick to release-3.1 failed

lfkdsk added a commit to JustProject/tidb that referenced this pull request Oct 26, 2019
…ect/tidb into feature-add-udf-support

* 'feature-add-udf-support' of https://github.com/JustProject/tidb: (26 commits)
  *: fix bug that the kill command doesn't work when the killed session is waiting for the pessimistic lock (pingcap#12852)
  executor: fix the projection upon the indexLookUp in indexLookUpJoin can't get result. (pingcap#12889)
  planner, executor: support create view on union (pingcap#12595)
  planner/cascades: introduce TransformationID in cascades planner (pingcap#12879)
  executor: fix data race in test (pingcap#12910)
  executor: reuse chunk row for insert on duplicate update (pingcap#12847)
  ddl: speed up tests (pingcap#12888)
  executor: speed up test (pingcap#12896)
  expression: implement vectorized evaluation for `builtinSecondSig` (pingcap#12886)
  expression: implement vectorized evaluation for `builtinJSONObjectSig` (pingcap#12663)
  expression: speed up builtinRepeatSig by using MergeNulls (pingcap#12674)
  expression: speed up unit tests under the expression package (pingcap#12887)
  store,kv: snapshot doesn't cache the non-exists kv entries lead to poor 'insert ignore' performance (pingcap#12872)
  executor: fix data race in `GetDirtyTable()` (pingcap#12767)
  domain: increase TTL to reduce the occurrence of reporting min startTS errors (pingcap#12578)
  executor: split test for speed up (pingcap#12881)
  executor: fix inconsistent of grants privileges with MySQL when executing `grant all on ...` (pingcap#12330)
  expression: implement vectorized evaluation for `builtinJSONUnquoteSig` (pingcap#12841)
  tune grpc connection count between tidb and tikv (pingcap#12884)
  Makefile: change test parallel to 8 (pingcap#12885)
  ...
ngaut pushed a commit that referenced this pull request Oct 30, 2019
@crazycs520
Copy link
Contributor

@tiancaiamao need cherry pick to 3.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants