Skip to content

[Master]Add Change Data Capture (CDC) APIs to stream data out of YugabyteDB #9019

Open
@suranjan

Description

Motivation

  • Without Change Data Capture (CDC), database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in several ways.
  • Without CDC, for staging, the entire contents of tables are moved into flat files and interfaces become error-prone and manpower intensive to administer
  • Without CDC, It becomes expensive because you must write and maintain the capture software yourself, or purchase it from a third-party vendor.
  • So, we need an efficient, distributed, row-level change data capture (CDC) feed into a configurable sink for downstream processing such as reporting,full-text indexes, analytics engines, or big data pipelines.
  • Applications can use change streams to subscribe to all data changes on a single table, a database, or an entire deployment, and immediately react to them.

Phase 1

Status Subtask GitHub Issue Estimated Time
Implement the CDC Lifecycle API
Implement the GetChanges method of CDC API #9022
Define the CDCEvent Structure #9020
Develop Simple Console Client #9021
Support Snapshot of the table before the start of the CDC
Allow DDL changes to be propagated
🛠 Build a Kafka Source Connector (Debezium) #11855
⬜️ Support reading the 'before image' of a change

Phase 2

Status Subtask GitHub Issue
⬜️ Remove dependency on 'Kafka'
⬜️ Support UDT datatype for CDC
⬜️ Support Row Level Security
⬜️ Support Metrics for tracking CDC state

The following issues are also being tracked and are under our plan for future releases:

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions