Skip to content

saach11/change-data-capture

Repository files navigation

Project Title

Change data capture (CDC) based on Apache Kafka and Debezium.

Description

This serves as an illustrative example of a Change Data Capture implementation using Debezium and Kafka. It functions as a playground for those keen on exploring CDC but necessitates fine-tuning for production readiness. We will test table and column filter functionality of Debezium.

Architecture

This is a standard change data capture archtiecture based on Kafka, Kafka Connect and Debezium pluging for CDC. This architecture can be easily scalabale, made higha avaiable and fault tolerant.

This setup mimics a SQL Server Always On Availability Group, thus Debezium will be connected to read replica. Context Diagram

These are the main architectural components: replica* Microsoft SQL Server Availability Group with two nodes: MSSQL Primary and MSSQL Secondary. CDC must be enabled on both, even if Debezium is connected to read replica.

  • Kafka Connect with Debezium plugin
  • Kafka Connect with S3 Sink plugin (Can be a single Kafka Connect image with two source and sink connectors)
  • Apach Kafka Raft - uses Raft consensus protocol for metadata management instead of ZooKeeper.
  • Amazon S3 bucket - used as data sink destiantion

Getting Started

Dependencies

  • Docker
  • Docker-compose
  • AWS account

Start containers

Start containers using docker-compose

docker compose up -d

Configure source - SQL Servers

Insert some dummy data into Primary SQL Server

cat sqlserver-init/01-insert-data.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

Configure SQL Server primary node

cat sqlserver-init/02-setup-mssql-primary.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

Configure SQL Server secondary node

cat sqlserver-init/03-setup-mssql-secondary.sql | docker exec -i mssql-secondary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

Create a source connector in Debezium plugin of Kafka Connect

curl -X POST -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d  @kafka-connect-source/register-source-sqlserver.json

Configure destination - AWS S3 bucket

Create a S3 bukcet and an Access Key with priveleges to read/write into S3 buckets.

Create a S3 sink connector in Kafka Connect for Customers table

curl -X POST -H  "Content-Type:application/json" http://localhost:8084/connectors/ -d  @kafka-connect-sink/s3-sink-connector-customers.json

Create a S3 sink connector in Kafka Connect for Products table

curl -X POST -H  "Content-Type:application/json" http://localhost:8084/connectors/ -d  @kafka-connect-sink/s3-sink-connector-products.json

Test the setup

Trigger CDC by inserting data in CDC enabled table - Customers.

cat sqlserver-init/04-insert-customers-table.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

New messages can be noticed in the table topic in Kafka UI Context Diagram

Notice new objects created in S3 bucket. The path - topics/<topic.prefix>.<database.names>.<table.include.list>[*]/partition=0

Context Diagram

Trigger CDC by updating data in a CDC enabled table - Customers, and column that is no in exluded list.

cat sqlserver-init/05-update-customers-table-cdc-trigger.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

Check new messages in the table topic.

Update data in a CDC enabled table - Customers, but a column that is in excluded list - last_name. Notice Debezium has not been triggered.

cat sqlserver-init/06-update-customers-table-no-cdc-trigger.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'

No new message ins Kafka topic nor in S3 bucket.

Clean up

docker compose down

License

This project is licensed under the Apache License - see the LICENSE.md file for details.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published