Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing and fatal collect mechanism in TiCDC #765

Open
5 tasks
amyangfei opened this issue Jul 20, 2020 · 0 comments
Open
5 tasks

Add tracing and fatal collect mechanism in TiCDC #765

amyangfei opened this issue Jul 20, 2020 · 0 comments
Labels
difficulty/hard Hard task. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.

Comments

@amyangfei
Copy link
Contributor

amyangfei commented Jul 20, 2020

Feature Request

Is your feature request related to a problem? Please describe:

TiCDC defines some fatal errors for fast fail in some data inconsistent scenarios. In most of these cases, replication can be recovered by resuming the task. But it is different to tell the root cause of fatal error and whether data inconsistency would happen in downstream.

Describe the feature you'd like:

TiCDC should provide a flexible way for fatal error collect and tracing. We should classify errors and apply different strategies to different kinds of fetal error.

Task List

  • Refine error usage

    • Make regulation and unified the usage of return err and error chan
  • Basic error tracking framework

    • Design and implement a general module for error information track, record and persistent, used for some fatal error debug and backtrace.
  • Deal with the fatal error of The CRTs must be greater than the resolvedTs

    • puller: Design and implement a tsTracker tracking module in puller, which enables recoding ts forward history and saving necessary information when a fatal error happens, the saved information can help us to backtrace the forward history and find the potential bug.
    • Other modules, including processor and KV client: Firstly we should log enough context information when fatal error happens. Secondly we should estimate whether more tracing information can be saved.
  • Other fatal errors

    • Classify other fatal errors, also need to investigate whether tracing information can be saved in these scenarios.

Value

Value description

This feature will be helpful to debug and data consistency check in extreme error scenarios.

Value score

  • 5

Workload estimation

  • (TBD) person-day

Time

GanttStart:
GanttDue:

@amyangfei amyangfei added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 20, 2020
@amyangfei amyangfei modified the milestone: v4.0.4 Jul 20, 2020
@amyangfei amyangfei added the difficulty/hard Hard task. label Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/hard Hard task. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Projects
None yet
Development

No branches or pull requests

2 participants