Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replayed pessimistic lock requests using WakeUpModeForceLock may lead to correctness issue #14311

Closed
MyonKeminta opened this issue Mar 1, 2023 · 8 comments · Fixed by pingcap/tidb#43132
Labels
affects-7.0 severity/major sig/transaction SIG: Transaction type/bug The issue is confirmed as a bug.

Comments

@MyonKeminta
Copy link
Contributor

Bug Report

When WakeUpModeForceLock is used, if a replayed pessimistic lock request (maybe caused by network issue) arrives after a lock is missing (maybe caused by pipelined locking or in-memory pessimistic lock), it's possible to cause correctness issue. The complete procedure to cause the problem is:

  1. A pessimistic lock of transaction T1 succeeded with WakeUpModeForceLock enabled, then it returns to TiDB and TiDB continues its execution.
  2. The lock is lost due to some reason such as pipelined locking or in-memory pessimistic lock.
  3. Another transaction T2 writes the key and committed.
  4. The key then receives a stale pessimistic lock request of T1 that has been received in step 1 (maybe because of retrying due to network issue in step 1). Since it allows locking with conflict, though there's a newer version that's later than the request's for_update_ts, the request can still acquire the lock. However no one will check the result of the request.
  5. The transaction T1 commits. When it prewrites it checks if each key is pessimistic-locked as expected. It won't notice anything wrong since the key does have a pessimistic lock of the same transaction. Therefore T1 commits successfully. However, one of the key is locked on a different version. The conflict between transaction T1 and T2 is missed.

A possible fix would be: record the for_update_ts of each locked key in membuffer in client-go, then carry them in prewrite requests. When TiKV handles pessimistic prewrite requests, check if the for_update_ts in the lock in TiKV matches that in the prewrite request. This needs changing the membuffer to store the for_update_ts per key, and comsumes 8 bytes memory in addition for each key. It also needs to add a new field in prewrite requests.

@MyonKeminta
Copy link
Contributor Author

MyonKeminta commented Mar 29, 2023

A possible solution:

We assume that for a transaction from TiDB, the amount of keys locked in fair locking mode won't be too large (so that our memory usage is acceptable). Then:

client-go:

  • (After finishing pessimistic lock RPC) For keys that are locked in ForceLock mode, we store them in a map that maps keys to a timestamp named expectedForUpdateTS, during LockKeys invocation.
    • Where: expectedForUpdateTS is lockedWithConflictTS if any, or the forUpdateTS of the current LockKeys invocation otherwise. Therefore, it equals to the for_update_ts field written to the pessimistic lock in TiKV.
  • When prewriting, carry those information in prewrite requests and send them to TiKV.
  • When TiKV handling prewrite request, if a key has a corresponding expected_for_update_ts, check whether the pessimistic lock on it have for_update_ts <= expected_for_update_ts. If it's violated, we come to a conclusion that the pessimistic lock on it is lost, then another transaction wrote the key, and then a stale pessimistic lock request (ForceLock mode) was sent to it. Return PessimisticLockNotFound error in this case.
  • In kvproto, add such a field to PrewriteRequest:
message ForUpdateTSConstraint {
    uint32 index = 0;
    uint64 expected_for_update_ts = 1;
}

message PrewriteRequest {
    // ...
    repeated ForUpdateTSConstraint for_update_ts_constraints = ...;
}

For each item in check_for_update_ts, it means that the item.index-th key in the request is expected to be pessimistic-locked with locked.for_update_ts <= item.expected_for_update_ts


Update:

The constraint will be defined as for_update_ts == expected_for_update_ts instead of <= for simplicity.

@MyonKeminta
Copy link
Contributor Author

cc @cfzjywxk @ekexium PTAL if the solution is fine.

@cfzjywxk
Copy link
Collaborator

check whether the pessimistic lock on it have for_update_ts <= expected_for_update_ts

When could lock.for_update_ts < expected_for_update_ts happen? Should error be reported as it looks like an unexpected situation?

@MyonKeminta
Copy link
Contributor Author

MyonKeminta commented Mar 30, 2023

I think this can be allowed since that we can only say there is unexpected write conflict (thus there might be data consistency issue) when lock.for_update_ts > expected_for_update_ts.
Though the < case is very unlikely to happen and we may never ecounter it, I think the possibility exists theoretically, like:

  1. A transaction locks key k with for_update_ts = 10
  2. The transaction performs pessimistic rollback on it.
  3. The transaction updates for_update_ts and performs pessimistic lock again on key k with for_update_ts = 20, in ForceLock mode. There's no conflict.
  4. The pessimistic lock on k is lost
  5. A stale pessimistic lock request produce in step 1 arrived, and then the key is locked with for_update_ts = 10.
  6. Prewrite. The expected_for_update_ts is 20 while the lock have for_update_ts = 10.

@cfzjywxk
Copy link
Collaborator

@MyonKeminta
In the above case, the meaning of two pessimistic lock requests could be different. For example, the first request does not require existence check while the second one does, though the stale one derived from the first request succeeds.

@MyonKeminta
Copy link
Contributor Author

Sorry I didn't get it. It seems to me that, if the second one performs existence check or returns value, then the lock is lost and a stale request of step 1 succeeded, it can be said that value returned or checked by the second pessimistic lock request is still valid since no write conflict occurs.

@cfzjywxk
Copy link
Collaborator

Consider this case:

  1. A transaction txn1 locks key k with for_update_ts = 10.
  2. The transaction updates for_update_ts and performs pessimistic lock again on key k with for_update_ts = 20, in ForceLock mode. There's no conflict. need_check_existence is set to true.
  3. The lock with for_update_ts = 20 is lost.
  4. Transaction txn2 adds a put to k with commit_ts = 10.
  5. The stale request from step 1 could still succeed, say commit_ts == for_update_ts == 10.

The pessimistic lock request in step 5 should fail in normal path, as the need_check_existence is true and there does exist an PUT record with commit_ts = 10.

@MyonKeminta
Copy link
Contributor Author

If a pessimisitc lock request have read the value or checked existence with for_update_ts = 20, there should never be any later commit with commit_ts <= 20.

ti-chi-bot pushed a commit to pingcap/tidb that referenced this issue Apr 11, 2023
ti-chi-bot added a commit that referenced this issue Apr 12, 2023
ref #14311

Supports checking for_update_ts for specific keys during prewrite to avoid potential lost update that might be caused by allowing locking with conflict.

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.0 severity/major sig/transaction SIG: Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
2 participants