Change data capture (CDC) based on Apache Kafka and Debezium.
This serves as an illustrative example of a Change Data Capture implementation using Debezium and Kafka. It functions as a playground for those keen on exploring CDC but necessitates fine-tuning for production readiness. We will test table and column filter functionality of Debezium.
This is a standard change data capture archtiecture based on Kafka, Kafka Connect and Debezium pluging for CDC. This architecture can be easily scalabale, made higha avaiable and fault tolerant.
This setup mimics a SQL Server Always On Availability Group, thus Debezium will be connected to read replica.
These are the main architectural components: replica* Microsoft SQL Server Availability Group with two nodes: MSSQL Primary and MSSQL Secondary. CDC must be enabled on both, even if Debezium is connected to read replica.
- Kafka Connect with Debezium plugin
- Kafka Connect with S3 Sink plugin (Can be a single Kafka Connect image with two source and sink connectors)
- Apach Kafka Raft - uses Raft consensus protocol for metadata management instead of ZooKeeper.
- Amazon S3 bucket - used as data sink destiantion
- Docker
- Docker-compose
- AWS account
Start containers using docker-compose
docker compose up -d
Insert some dummy data into Primary SQL Server
cat sqlserver-init/01-insert-data.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
Configure SQL Server primary node
cat sqlserver-init/02-setup-mssql-primary.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
Configure SQL Server secondary node
cat sqlserver-init/03-setup-mssql-secondary.sql | docker exec -i mssql-secondary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
Create a source connector in Debezium plugin of Kafka Connect
curl -X POST -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @kafka-connect-source/register-source-sqlserver.json
Create a S3 bukcet and an Access Key with priveleges to read/write into S3 buckets.
Create a S3 sink connector in Kafka Connect for Customers table
curl -X POST -H "Content-Type:application/json" http://localhost:8084/connectors/ -d @kafka-connect-sink/s3-sink-connector-customers.json
Create a S3 sink connector in Kafka Connect for Products table
curl -X POST -H "Content-Type:application/json" http://localhost:8084/connectors/ -d @kafka-connect-sink/s3-sink-connector-products.json
Trigger CDC by inserting data in CDC enabled table - Customers.
cat sqlserver-init/04-insert-customers-table.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
New messages can be noticed in the table topic in Kafka UI
Notice new objects created in S3 bucket. The path - topics/<topic.prefix>.<database.names>.<table.include.list>[*]/partition=0
Trigger CDC by updating data in a CDC enabled table - Customers, and column that is no in exluded list.
cat sqlserver-init/05-update-customers-table-cdc-trigger.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
Check new messages in the table topic.
Update data in a CDC enabled table - Customers, but a column that is in excluded list - last_name. Notice Debezium has not been triggered.
cat sqlserver-init/06-update-customers-table-no-cdc-trigger.sql | docker exec -i mssql-primary bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD'
No new message ins Kafka topic nor in S3 bucket.
docker compose down
This project is licensed under the Apache License - see the LICENSE.md file for details.