Open
Description
Motivation
- Without Change Data Capture (CDC), database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in several ways.
- Without CDC, for staging, the entire contents of tables are moved into flat files and interfaces become error-prone and manpower intensive to administer
- Without CDC, It becomes expensive because you must write and maintain the capture software yourself, or purchase it from a third-party vendor.
- So, we need an efficient, distributed, row-level change data capture (CDC) feed into a configurable sink for downstream processing such as reporting,full-text indexes, analytics engines, or big data pipelines.
- Applications can use change streams to subscribe to all data changes on a single table, a database, or an entire deployment, and immediately react to them.
Phase 1
Status | Subtask | GitHub Issue | Estimated Time |
---|---|---|---|
✅ | Implement the CDC Lifecycle API | ||
✅ | Implement the GetChanges method of CDC API | #9022 | |
✅ | Define the CDCEvent Structure | #9020 | |
✅ | Develop Simple Console Client | #9021 | |
✅ | Support Snapshot of the table before the start of the CDC | ||
✅ | Allow DDL changes to be propagated | ||
🛠 | Build a Kafka Source Connector (Debezium) | #11855 | |
⬜️ | Support reading the 'before image' of a change |
Phase 2
Status | Subtask | GitHub Issue |
---|---|---|
⬜️ | Remove dependency on 'Kafka' | |
⬜️ | Support UDT datatype for CDC | |
⬜️ | Support Row Level Security | |
⬜️ | Support Metrics for tracking CDC state |
The following issues are also being tracked and are under our plan for future releases:
- Native CDC support without Debezium - [CDCSDK] Enable native CDC support without Debezium #11856
- CDC push to Kafka - [CDCSDK] CDC support for pushing to Kafka #11857
- Push to webhooks - [CDCSDK] CDC support for pushing to webhooks #11858
- OLAP integration of CDC (Snowflake, BigQuery, etc) -- [CDCSDK] CDC to downstream Data warehouses (Redshift, Snowflake, BigQuery) #11859
- Object store integration (S3, Minio, etc) --> [CDCSDK] Object store integration of CDC (S3, Minio, etc) #11860
- Message bus integration (PubSub, Kinesis, etc) --> [CDCSDK] Message bus integration of CDC (PubSub, Kinesis, etc) #11861
Activity