Open
Description
Feature Request
Is your feature request related to a problem? Please describe:
When the downstream cannot handle the writes in time, usually due to a large throughput from the upstream, CDC will OOM because the sink buffers data only in memory.
Describe the feature you'd like:
- Use on-disk buffer in sink (basic solution).
- Make this on-disk buffer persistent along with relevant checkpointTs and resolvedTs, so that in the event of a crash or restart of the cdc process, the buffered data can be restored (optimization).
Describe alternatives you've considered:
- We could alternatively buffer data only at the sorter, but since the sorter caches unsorted data, it is not friendly to recording progress and resuming after crashes.