Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: lazy constraint check in pessimistic txn #36889

Merged

Conversation

sticnarf
Copy link
Contributor

@sticnarf sticnarf commented Aug 4, 2022

What problem does this PR solve?

Issue Number: ref #36579

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 4, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • cfzjywxk
  • ekexium

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 4, 2022
@sticnarf sticnarf force-pushed the pessimistic-lazy-constraint-check-doc branch from 5fd5435 to 40e0d74 Compare August 4, 2022 06:54
@sticnarf
Copy link
Contributor Author

sticnarf commented Aug 4, 2022

cc @ekexium @TonsnakeLin @cfzjywxk

@sre-bot
Copy link
Contributor

sre-bot commented Aug 4, 2022

docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
docs/design/2022-08-04-lazy-constraint-check.md Outdated Show resolved Hide resolved
@sticnarf sticnarf force-pushed the pessimistic-lazy-constraint-check-doc branch from 8401fe6 to 36027d1 Compare August 4, 2022 07:45
@ekexium
Copy link
Contributor

ekexium commented Aug 4, 2022

Shall we add a section to demonstrate the safety of the change since the constraint check mechanism is subtle and vulnerable?


Note that the feature is only needed if part of your transaction still needs to acquire pessimistic locks. Otherwise, use the optimistic transaction mode instead.

* `INSERT` rows to a table without any unique key. TiDB does not generate duplicated implicit row ID, so normally there cannot be unique key conflicts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting the use cases to simple INSERT statements is needed.

/* s1 */ INSERT INTO t1 VALUES (1);
/* s1 */ COMMIT;
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For such cases,

/* init */ CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY, value int);
/* init */ INSERT INTO t1 VALUES (1, 1);

/* s1 */ BEGIN PESSIMISTIC;
/* s1 */ INSERT INTO t1 VALUES (1, 2) LAZY CHECK; -- Skip acquiring the lock. So the statement would succeed.
/* s1 */ SELECT * FROM t1 FOR UPDATE; -- Here the pessimistic lock on rowkey '1' would be acquired

Here it's a bit difficult to decide the execution result of the for update statement, should the memory buffer content be returned or the row value(1, 1), or a duplicated entry error detected acquiring the pessimistic lock?

Actually, the snapshot read semantic becomes weird in some cases where the pessimistic current read is used together such as #24195. A side effect may be that the current read semantic would become weird too in some special cases, especially some historical technical debt like the behaviors of the unionScan executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... This makes the case of #24195 more complex.

I don't think the complete solution to this problem should be part of this proposal. So, for this case, personally I think we can treat it as if the pessimistic lock of the INSERT gets lost (due to failed pipelined or in-memory locking). This can happen in a normal pessimistic transaction now. In this case, the value in the table can be different from those in the buffer.

Copy link
Contributor

@cfzjywxk cfzjywxk Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treat it as if the pessimistic lock of the INSERT gets lost

Make sense, so we could just ignore the lazily written result if it's conflicted with some returned current read result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Acquisition of lock always resolves constraint check" is simpler in semantics, which requires 1 additional state of keys in prewrite: no lock + check.
While treating as if the lock is lost requires 2 additional states: lock + check and no lock + check in prewrite. It also introduces complexity in the read/write procedures, which increases the risk of doing it right.

It seems the cost is throwing "Duplicate key" errors in SELECT FOR UPDATE or other DMLs that acquire locks. I assume users should be ready for handling such cases since the feature is not for everyone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, yes. As long as we can accept throwing "duplicated entry" errors in a read-only statement, "Acquisition of lock always resolves constraint check" is the easiest to implement and brings least problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow @ekexium's idea and add a section to define the behavior in this case.

Copy link
Contributor

@TonsnakeLin TonsnakeLin Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's a bit difficult to decide the execution result of the for update statement, should the memory buffer content be returned or the row value(1, 1), or a duplicated entry error detected acquiring the pessimistic lock?

If 'deferred insert' mixed with 'lock if exists', it will make pessimistic lock invalid in some cases. The feature 'lock if exists' may not coexist with 'deferred insert'.
create table t(a int primary key, b int)
begin;
insert into t values(1,1) /* lazy checked*/ ; -- cached in buffer and don't acquire pessimistic lock or unique check
select * from t where a = 1 for update /lock if exists/; -- tikv can't find the record and doesn't make a pessimistic lock
commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert into t values(1,1) /* lazy checked*/ ; -- cached in buffer and don't acquire pessimistic lock or unique check
select * from t where a = 1 for update /lock if exists/; -- tikv can't find the record and doesn't make a pessimistic lock
commit

In this case, the late acquisition of lock should override lock_if_exists. Maybe we have to get from the mem buffer first to decide how we lock the keys.

@sticnarf sticnarf force-pushed the pessimistic-lazy-constraint-check-doc branch 2 times, most recently from 22d2d4d to b9b8a26 Compare August 9, 2022 03:49
@ekexium
Copy link
Contributor

ekexium commented Aug 9, 2022

I am somewhat convinced of the safety of write: writes won't break the integrity of data in the storage.
While the safety of read, namely read statements won't return results that break the constraint, may require further consideration.

A case similar to that in #24195, if we enable the feature:

create table t2 (id int primary key, uk int, unique key i1(uk))
insert into t2 values (1, 1)
set @@tidb_txn_assertion_level=off

begin pessimistic
insert into t2 values (2, 1)
select * from t2 use index(primary) for update -- same result whether for update or not
delete from t2 where id = 1
commit

admin check table t2

The select result is 2 rows (1, 1) and (2, 1), which breaks the uniqueness constraint. While the final commit can succeed. And the admin check will report data-index inconsistency. BTW, assertion could prevent the txn from committing.

"Acquisition of lock always resolves constraint check" doesn't help. The feature sort of enlarges the anomaly

@ekexium
Copy link
Contributor

ekexium commented Aug 10, 2022

For reasoning its safety, I'm considering what if we allow an arbitrary mix of optimistic and pessimistic mutations in one transaction? Will it break any safety property?

@sticnarf
Copy link
Contributor Author

For reasoning its safety, I'm considering what if we allow an arbitrary mix of optimistic and pessimistic mutations in one transaction? Will it break any safety property?

Pessimistic mutations are inheritantly no different from optimistic mutations after they are turned into 2pc locks in prewrite, so I think it is very unlikely that this will break any properties.

Instead, I worry more about the handling about union scan and generating the mutations. It's more complicated when the result set sometimes break constraints but the transaction can commit in the end.

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 10, 2022

the handling about union scan and generating the mutations

There're several cases we may need to design and handle carefully, for example:

  • delete-your-write from which we've encountered data inconsistency issues before. The inserted content may not need to be committed but constraint checks are still required.
  • Report duplicate error in time and avoid further weird behaviors considering the unionScan executor.
  • The transactional key flags and statement level key flags processing which are easy to make mistakes.

@ekexium
Copy link
Contributor

ekexium commented Aug 10, 2022

When a key that has its constraint check deferred in a previous statement gets modified, we have 2 options:

  1. Lock it as we did before, resolve the constraint check and unset the defer flag
    Then we need some kind of a "once flag" mechanism, considering the possibility that a statement may involve multiple operations on a key. Any modifications to the key (even in the same statement) will unset the flag.

  2. Skip the lock. The reason is the same as the one that let us skip the constraint check: if the user believes the key is less likely to have conflicts, why don't we skip the lock as well? It doesn't affect the success rate after all. It requires PresumeNotExist flags on more than PUT mutations. An extra Op like DeleteMustExist needs to be added, at least.

@sticnarf
Copy link
Contributor Author

2. Skip the lock. The reason is the same as the one that let us skip the constraint check: if the user believes the key is less likely to have conflicts, why don't we skip the lock as well? It doesn't affect the success rate after all. It requires PresumeNotExist flags on more than PUT mutations. An extra Op like DeleteMustExist needs to be added, at least.

What I'm not sure about the second option is the risk of how we handle a result set with broken constraints. And you have found an example above that there can be inconsistency unless we add new special ops. There could be other possible issues that we haven't found.

@ekexium
Copy link
Contributor

ekexium commented Aug 10, 2022

There could be other possible issues that we haven't found.

Yes, I'm pursuing proof of the safety of this change, whichever options we take.

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 11, 2022

select * from t2 use index(primary) for update -- same result whether for upda

@ekexium
Does it happen because the select for update statement does not acquire the pessimistic lock on the unique key but just the row key? Actually, the unique key should also be locked though it's not changed as described in the issue.
It seems that if the row key and unique key are always locked Acquisition of lock always resolves constraint check should be able to work.

@ekexium
Copy link
Contributor

ekexium commented Aug 11, 2022

select * from t2 use index(primary) for update -- same result whether for upda

@ekexium Does it happen because the select for update statement does not acquire the pessimistic lock on the unique key but just the row key? Actually, the unique key should also be locked though it's not changed as described in the issue. It seems that if the row key and unique key are always locked Acquisition of lock always resolves constraint check should be able to work.

@cfzjywxk The problem exists in the implementation of option 2 mentioned above (so neither row key nor unique key is locked in the select for update and delete). Option 1 is free from this issue.

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 11, 2022

Actually, the unique key should also be locked though it's not changed as described in the #36438.

For option 1, would this still be a problem if the lock on a unique key is deferred and then the select for update does not lock it as expected because of the issue 36438 so the constraint resolving does not work on this unique key?

@ekexium
Copy link
Contributor

ekexium commented Aug 11, 2022

Actually, the unique key should also be locked though it's not changed as described in the #36438.

For option 1, would this still be a problem if the lock on a unique key is deferred and then the select for update does not lock it as expected because of the issue 36438 so the constraint resolving does not work on this unique key?

I think so. I mean it's free from data-index inconsistency. The seemingly invalid select result is an old issue. It's better to fix it, while our bottom line is that we guarantee the constraint satisfaction of read results only if the transaction successfully commits.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
…read from the membuf

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@sticnarf sticnarf force-pushed the pessimistic-lazy-constraint-check-doc branch from c5accb5 to 9a6f4df Compare August 15, 2022 08:29

Because `enum` is compatible with `bool` on the wire, we can seamlessly extend the `is_pessimistic_lock` field. In a TiDB cluster, TiKV instances are upgraded before TiDB. So, we can make sure TiKV can handle it correctly when TiDB starts using the new protocol.

When `tidb_constraint_check_in_place_pessimistic` is off, all the keys that have `PresumeKeyNotExists` flags don't need to be locked when executing the DML. They will be marked with a new special flag `NeedConstraintCheckInPrewrite`. When the transaction commits, TiDB will set the actions of these keys to `DO_CONSTRAINT_CHECK`. So, TiKV will not check the existence of pessimistic locks of these keys. Instead, constraint and write conflict checks should be done for these keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could clarify the detailed affected DML types by this variable, like:

  • The simple Insert and Update statements are considered.
  • Extended grammar like Insert on duplicate and Insert ignore is not considered.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Copy link
Contributor

@cfzjywxk cfzjywxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 17, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 17, 2022
@cfzjywxk
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: cf11337

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 17, 2022
@ti-chi-bot ti-chi-bot merged commit 9709249 into pingcap:master Aug 17, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Aug 17, 2022

TiDB MergeCI notify

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci/integration-cdc-test 🟢 all 36 tests passed 54 min Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 18 min Existing passed
idc-jenkins-ci-tidb/integration-common-test 🟢 all 17 tests passed 12 min Existing passed
idc-jenkins-ci-tidb/common-test 🟢 all 11 tests passed 12 min Existing passed
idc-jenkins-ci-tidb/integration-ddl-test 🟢 all 6 tests passed 5 min 51 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 31 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 4 min 18 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 43 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 3 min 12 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants