-
Notifications
You must be signed in to change notification settings - Fork 502
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
When using Aggregation Merge Engine with multiple concurrent Flink jobs writing to the same primary key table, the following issues occur:
- Concurrent Conflicts: Multiple writers updating the same aggregation columns simultaneously cause data inconsistency
- No Exactly-Once Guarantee: Job failover cannot guarantee aggregation accuracy
- Duplicate Calculations: Data replay after job restart leads to duplicate aggregation
- State Loss: Lack of state management to track committed data
These issues make Aggregation Merge Engine unsuitable for production scenarios requiring strict data accuracy.
Solution
The complete solution has been proposed in the FIP document FIP-21: Aggregation Merge Engine.
1. State Management
- WriterState: Tracks maximum committed offset per TableBucket
- BucketOffsetTracker: Client-side offset tracking
- Integration: Uses Flink's Operator State for persistence across checkpoints
2. Undo Recovery
- Mechanism: On failover, undoes uncommitted data written after last checkpoint
- Implementation: Reads old values at committed offset and overwrites new values
- Guarantee: Ensures data consistency after job restart
Anything else?
No response
Willingness to contribute
- I'm willing to submit a PR!
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels