Add tracing and fatal collect mechanism in TiCDC #765
Labels
difficulty/hard
Hard task.
help wanted
Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Feature Request
Is your feature request related to a problem? Please describe:
TiCDC defines some fatal errors for fast fail in some data inconsistent scenarios. In most of these cases, replication can be recovered by resuming the task. But it is different to tell the root cause of fatal error and whether data inconsistency would happen in downstream.
Describe the feature you'd like:
TiCDC should provide a flexible way for fatal error collect and tracing. We should classify errors and apply different strategies to different kinds of fetal error.
Task List
Refine error usage
return err
anderror chan
Basic error tracking framework
Deal with the fatal error of
The CRTs must be greater than the resolvedTs
puller
: Design and implement atsTracker
tracking module in puller, which enables recoding ts forward history and saving necessary information when a fatal error happens, the saved information can help us to backtrace the forward history and find the potential bug.processor
andKV client
: Firstly we should log enough context information when fatal error happens. Secondly we should estimate whether more tracing information can be saved.Other fatal errors
Value
Value description
This feature will be helpful to debug and data consistency check in extreme error scenarios.
Value score
Workload estimation
Time
GanttStart:
GanttDue:
The text was updated successfully, but these errors were encountered: