Skip to content

Use disk to buffer data before writing to sink #1096

Open
@liuzix

Description

Feature Request

Is your feature request related to a problem? Please describe:
When the downstream cannot handle the writes in time, usually due to a large throughput from the upstream, CDC will OOM because the sink buffers data only in memory.

Describe the feature you'd like:

  • Use on-disk buffer in sink (basic solution).
  • Make this on-disk buffer persistent along with relevant checkpointTs and resolvedTs, so that in the event of a crash or restart of the cdc process, the buffered data can be restored (optimization).

Describe alternatives you've considered:

  • We could alternatively buffer data only at the sorter, but since the sorter caches unsorted data, it is not friendly to recording progress and resuming after crashes.

Metadata

Assignees

No one assigned

    Labels

    component/sinkSink component.status/need-discussionIssue that needs to be discussed to confirm priority, milestone, plan and task breakdown.subject/new-featureDenotes an issue or pull request adding a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions