Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle orphan write cf key from a raftstore v2 tablet snapshot #7657

Merged
merged 28 commits into from
Jun 20, 2023

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Jun 14, 2023

What problem does this PR solve?

Issue Number: ref #7568

Problem Summary:

Raftstore v2 snapshot may contain extra write key which is from later raft log index than the snapshot has claimed. We decide to suppress this error rather than throwing an exception.

We will keep track of these orphan key, and throw an exception later if we are sure these keys are orphan because of data loss rather than an eager-replicated data.

What is changed and how it works?

The suppress part:

  1. Deduce TiKV cluster's raftstore version by the first /engine_type response from TiKV.
    1. Note that TiKV later supports coexistence of v1 and v2 node in a cluster. After that, they can provide us a mechanism to fix this.
  2. Store all orphan write key in OrphanKeyInfo struct when prehandling snapshot.
  3. Remove a key from OrphanKeyInfo when insert into write cf.
  4. When meet orphan key in a row2col job which is not triggered by prehandling snapshot, check OrphanKeyInfo first. If the key is registered in OrphanKeyInfo, consider it as a soft error.

The async checking part:

  1. Log a deadline_index sent from TiKV's snapshot which must GE than snapshot_index in in OrphanKeyInfo.
  2. If the applied_index is proceeded exceed the deadline_index, check if the OrphanKeyInfo is cleared. Otherwise raise a hard error.

What is not included:

  1. PreHandleSnapshot requires deadline_index being sent from TiKV. So we can have async check only after TiKV support this.
  2. Although, we have this PR merged first to avoid blocking test.

NOTE
Orphan keys will not be iterated by region read scanner, so it will not be presented in data_list_read, so it will not be removed as a committed entry. We don't eagerly do that, since it brings extra cost. These keys will either be removed when its default counterpart arrives in the following raft log, or being reported as an error when exceeds deadline_index.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • Tested the cluster version part
    [2023/06/12 12:41:25.170 +08:00] [INFO] [run.rs:147] ["start probing cluster's raftstore version"]
    [2023/06/12 12:41:25.267 +08:00] [INFO] [run.rs:150] ["cluster's raftstore version is V2"]
    [2023/06/12 14:14:17.974 +08:00] [INFO] [run.rs:147] ["start probing cluster's raftstore version"]
    [2023/06/12 14:14:18.072 +08:00] [INFO] [run.rs:150] ["cluster's raftstore version is V1"]
    
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Handle orphan write cf key from a raftstore v2 tablet snapshot

z
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 14, 2023
@CalvinNeo CalvinNeo changed the title Suppress panic when meet orphan write cf key in v2 Distinguish orphan write cf key in v2 Jun 14, 2023
@CalvinNeo CalvinNeo changed the title Distinguish orphan write cf key in v2 Handle orphan write cf key from a raftstore v2 tablet snapshot Jun 14, 2023
@CalvinNeo CalvinNeo changed the title Handle orphan write cf key from a raftstore v2 tablet snapshot WIP Handle orphan write cf key from a raftstore v2 tablet snapshot Jun 14, 2023
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 14, 2023
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 14, 2023
@Lloyd-Pottiger
Copy link
Contributor

How about converting to draft before ready to avoid seeds messages to reviewers?

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo CalvinNeo marked this pull request as draft June 15, 2023 03:56
f
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
return std::nullopt;
}
// Otherwise, this is still a hard error.
// TODO We still need to check if there are remained orphan keys after we have applied after peer's flushed_index.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done when tikv support a deadline_index.

@CalvinNeo
Copy link
Member Author

/run-all-tests

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

f
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
CalvinNeo and others added 7 commits June 16, 2023 18:23
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
@CalvinNeo
Copy link
Member Author

/run-all-tests

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 20, 2023
@ti-chi-bot ti-chi-bot bot added the lgtm label Jun 20, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jun 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,lidezhu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 20, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jun 20, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-06-20 03:00:34.336236373 +0000 UTC m=+71799.737486822: ☑️ agreed by JaySon-Huang.
  • 2023-06-20 06:53:45.049481776 +0000 UTC m=+85790.450732224: ☑️ agreed by lidezhu.

CalvinNeo and others added 2 commits June 20, 2023 14:56
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@JaySon-Huang
Copy link
Contributor

@CalvinNeo can we unhold this PR and get it merged?

@CalvinNeo
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2023
@ti-chi-bot ti-chi-bot bot merged commit 5a36917 into pingcap:master Jun 20, 2023
CalvinNeo added a commit to CalvinNeo/tiflash that referenced this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants