Generate Redpanda Connect pipeline configurations for Change Data Capture (CDC) workflows.
A CLI-first tool for managing CDC pipelines with automatic Docker dev container setup, supporting both db-per-tenant (one database per customer) and db-shared (single database, multi-tenant) patterns.
- π Zero-dependency setup: Only Docker required
- π³ Docker-first: Run from Docker Hub image - no local installation needed
- π Multi-tenant patterns: Support for db-per-tenant and db-shared architectures
- π Template-based generation: Jinja2 templates for flexible pipeline configuration
- β
CLI-first philosophy: All operations via
cdccommands, no manual YAML editing - π οΈ Database integration: Auto-updates docker-compose.yml with database services
- π Automated releases: Semantic versioning with conventional commits
Only Docker required - zero dependencies!
Supports Intel (x86_64) and Apple Silicon (ARM64) platforms.
# Pull latest version
docker pull asmacarma/cdc-pipeline-generator:latest
# Verify platform support
docker image inspect asmacarma/cdc-pipeline-generator:latest | grep Architecture# Pull latest version
docker pull asmacarma/cdc-pipeline-generator:latest
β οΈ CLI-First Philosophy: All configuration is managed throughcdccommands. Never edit YAML files manually. The CLI is the sole interface for configuration management.
Create a docker-compose.yml in your project directory:
services:
dev:
image: asmacarma/cdc-pipeline-generator:latest
volumes:
- .:/workspace
working_dir: /workspace
stdin_open: true
tty: true
entrypoint: ["/bin/bash", "-c"]
command: ["fish"]
# When you run 'cdc scaffold', database services (mssql/postgres) will be
# automatically inserted below, while this dev service remains unchanged.
# Version pinning options:
# - :latest - Always pulls newest version (auto-updates on docker compose pull)
# - :0 - Pins to major version 0.x.x (stable, gets minor/patch updates)
# - :0.2 - Pins to minor version 0.2.x (only patch updates)
# - :0.2.4 - Pins to exact version (no updates)Version strategy:
- Development: Use
:latestfor newest features - Production: Use
:0to auto-update within major version - Critical systems: Use exact version like
:0.2.4
cdc scaffold. New database services will be inserted while preserving the dev service.
# Create project directory
mkdir my-cdc-project
cd my-cdc-project
# Copy the docker-compose.yml from above, then initialize:
docker compose run --rm dev init
# β
Creates project structure, Dockerfile.dev, pipeline templates, directories
# Start the dev container
docker compose up -d
# Enter the dev container shell
docker compose exec dev fish
# π You are now inside the container with full cdc CLI and Fish completionsInside the dev container, you'll see a Fish shell prompt with:
- β
cdccommand available with tab completion - β All dependencies pre-installed
- β
Your project directory mounted at
/workspace
Now working inside the container shell, run the scaffold command:
# π Inside dev container
# For db-per-tenant pattern (one database per customer)
cdc scaffold my-group \
--pattern db-per-tenant \
--source-type mssql \
--extraction-pattern "^myapp_(?P<customer>[^_]+)$"
# For db-shared pattern (multi-tenant, single database)
cdc scaffold my-group \
--pattern db-shared \
--source-type postgres \
--extraction-pattern "^myapp_(?P<service>[^_]+)_(?P<env>(dev|stage|prod))$" \
--environment-awareRequired flags explained:
| Flag | Values | Description |
|---|---|---|
--pattern |
db-per-tenant or db-shared |
Choose your multi-tenancy model |
--source-type |
postgres or mssql |
Source database type |
--extraction-pattern |
Regex string | Pattern to extract identifiers from DB names |
--environment-aware |
(flag, no value) | Required for db-shared only - enables env grouping |
Pattern-specific requirements:
For --pattern db-per-tenant:
- Regex must have named group:
(?P<customer>...) - Example:
"^myapp_(?P<customer>[^_]+)$"matchesmyapp_customer1
For --pattern db-shared:
- Regex must have named groups:
(?P<service>...)and(?P<env>...) - Must include
--environment-awareflag - Example:
"^myapp_(?P<service>users)_(?P<env>dev|stage|prod)$"
Fish shell autocomplete (inside dev container):
- Type
cdc scaffold my-group --pattern+ TAB β showsdb-per-tenantanddb-shared - Type
cdc scaffold my-group --source-type+ TAB β showspostgresandmssql
What gets created:
- β
source-groups.yamlwith your configuration - β
Updates
docker-compose.yml- inserts database services (mssql/postgres) afterdevservice - β
Directory structure:
services/,generated/,pipeline-templates/ - β
Connection credentials use env vars:
${POSTGRES_SOURCE_HOST}, etc.
Docker Compose update example: After scaffold, your docker-compose.yml will have new services added:
services:
dev: # β Your original service (preserved)
image: asmacarma/cdc-pipeline-generator:latest
# ... unchanged ...
mssql: # β Added by scaffold
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
ACCEPT_EULA: "Y"
MSSQL_SA_PASSWORD: ${MSSQL_PASSWORD}
postgres-target: # β Added by scaffold
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: ${POSTGRES_TARGET_PASSWORD}# Copy example and edit with your credentials
cp .env.example .env
nano .env # or use your preferred editorExample .env:
# Source Database (MSSQL)
MSSQL_HOST=mssql
MSSQL_PORT=1433
MSSQL_USER=sa
MSSQL_PASSWORD=YourPassword123!
# Target Database (PostgreSQL)
POSTGRES_TARGET_HOST=postgres-target
POSTGRES_TARGET_PORT=5432
POSTGRES_TARGET_USER=postgres
POSTGRES_TARGET_PASSWORD=postgres
POSTGRES_TARGET_DB=cdc_target# Exit container temporarily
exit
# Start databases and dev container
docker compose up -d
# Re-enter dev container
docker compose exec dev fish# Create service
cdc manage-service --create my-service
# Add tables to track
cdc manage-service --service my-service --add-table Users --primary-key id
cdc manage-service --service my-service --add-table Orders --primary-key order_id
# Inspect available tables (optional)
cdc manage-service --service my-service --inspect --schema dbo# Generate pipelines for development environment
cdc generate-pipelines --service my-service --environment dev
# Check generated files
ls generated/pipelines/
ls generated/schemas/Generated pipeline files in generated/pipelines/ are ready to deploy to your Redpanda Connect infrastructure.
docker run --rm -v $PWD:/workspace -w /workspace asmacarma/cdc-pipeline-generator:latest initdocker run --rm -v $PWD:/workspace -w /workspace asmacarma/cdc-pipeline-generator:latest scaffold <name> \
--pattern <db-per-tenant|db-shared> \
--source-type <postgres|mssql> \
--extraction-pattern "<regex>" \
[--environment-aware]
# Required for db-per-tenant:
# --pattern db-per-tenant
# --source-type postgres|mssql
# --extraction-pattern with 'customer' named group
# Required for db-shared:
# --pattern db-shared
# --source-type postgres|mssql
# --extraction-pattern with 'service' and 'env' named groups
# --environment-aware (mandatory flag)
# Optional connection overrides:
# --host <host> # Default: ${POSTGRES_SOURCE_HOST} or ${MSSQL_SOURCE_HOST}
# --port <port> # Default: ${POSTGRES_SOURCE_PORT} or ${MSSQL_SOURCE_PORT}
# --user <user> # Default: ${POSTGRES_SOURCE_USER} or ${MSSQL_SOURCE_USER}
# --password <password> # Default: ${POSTGRES_SOURCE_PASSWORD} or ${MSSQL_SOURCE_PASSWORD}
# Example patterns:
# - db-per-tenant: "^adopus_(?P<customer>[^_]+)$"
# - db-shared: "^asma_(?P<service>[^_]+)_(?P<env>(dev|stage|prod))$"
# - Empty pattern "" for simple fallback matching# Create service
cdc manage-service --create <name>
# Add tables
cdc manage-service --service <name> --add-table <TableName> --primary-key <column>
# Remove tables
cdc manage-service --service <name> --remove-table <TableName>
# Inspect database schema
cdc manage-service --service <name> --inspect --schema <schema-name># Generate all pipelines
cdc generate-pipelines --service <name> --environment <dev|stage|prod>
# Generate with snapshot
cdc generate-pipelines --service <name> --environment dev --snapshotcdc manage-source-groups --info
cdc manage-source-groups --list
### Pipeline Generation
```bash
# Generate for specific service
cdc generate --service <name> --environment <dev|stage|prod>
# Generate for all services
cdc generate --all --environment <env>
# Validate all configurations
cdc validateUse case: Each customer has a dedicated source database.
Example: AdOpus system with 26 customer databases.
Pipeline generation: Creates one source + sink pipeline per customer.
Use case: All customers share one database, differentiated by customer_id.
Example: ASMA directory service with customer isolation via schema/column.
Pipeline generation: Creates one source + sink pipeline for all customers.
See: examples/db-shared/
Use case: Each customer has a dedicated source database.
Example: SaaS application with isolated customer databases (customer_a_prod, customer_b_prod, etc.)
Pipeline generation: Creates one source + sink pipeline per customer database.
Setup:
cdc manage-source-groups --create my-group \
--pattern db-per-tenant \
--source-type mssql \
--extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)'Use case: All customers share one database, differentiated by customer_id column or schema.
Example: Multi-tenant application with customer isolation via tenant_id field
Pipeline generation: Creates one source + sink pipeline for all customers, with customer filtering.
Setup:
cdc manage-source-groups --create my-group \
--pattern db-shared \
--source-type postgresql \
--extraction-pattern '(?P<customer_id>\w+)' \
--environment-awarecdc-pipeline-generator/
βββ cdc_generator/ # Core library
β βββ core/ # Pipeline generation logic
β βββ helpers/ # Utility functions
β βββ validators/ # Configuration validation
β βββ cli/ # Command-line interface
βββ examples/ # Reference implementations
βββ db-per-tenant/ # Multi-database pattern
βββ db-shared/ # Single-database pattern
The recommended way to use this tool is inside the auto-generated dev container:
β
Isolated environment - No conflicts with host Python/packages
β
All dependencies pre-installed - Python 3.11, Fish shell, database clients
β
Database services included - MSSQL/PostgreSQL auto-configured
β
Consistent across team - Same environment for everyone
# Start all services (databases + dev container)
docker compose up -d
# Enter dev container
docker compose exec dev fish
# Stop all services
docker compose down
# Rebuild container (after updating generator version)
docker compose up -d --build
# View logs
docker compose logs -f dev
docker compose logs -f mssql
docker compose logs -f postgres-targetOnce inside (docker compose exec dev fish), you have:
- β
cdccommand available - β Access to source and target databases
- β Fish shell with auto-completions
- β Git configured (via volume mount)
- β SSH keys available (via volume mount)
All your project files are mounted at /workspace, so changes are reflected immediately.
After running cdc init, your project will have:
my-cdc-project/
βββ docker-compose.yml # Dev container + database services
βββ Dockerfile.dev # Container image definition
βββ .env.example # Environment variables template
βββ .env # Your credentials (git-ignored)
βββ .gitignore # Git ignore rules
βββ source-groups.yaml # Server group config (generated by cdc)
βββ README.md # Quick start guide
βββ 2-services/ # Service definitions (generated by cdc)
β βββ my-service.yaml
βββ 2-customers/ # Customer configs (for db-per-tenant)
βββ 3-pipeline-templates/ # Custom pipeline templates (optional)
βββ generated/ # Generated output (git-ignored)
βββ pipelines/ # Redpanda Connect pipeline YAML
βββ schemas/ # PostgreSQL schemas
βββ table-definitions/ # Table metadata
from cdc_generator.core.pipeline_generator import generate_pipelines
# Generate pipelines programmatically
generate_pipelines(
service='my-service',
environment='dev',
output_dir='./generated/pipelines'
)Place custom Jinja2 templates in 3-pipeline-templates/:
# 3-pipeline-templates/source-pipeline.yaml
input:
mssql_cdc:
dsn: "{{ dsn }}"
tables: {{ tables | tojson }}
# Your custom configurationUse environment variables in source-groups.yaml:
server:
host: ${MSSQL_HOST} # Replaced at runtime
port: ${MSSQL_PORT}
user: ${MSSQL_USER}
password: ${MSSQL_PASSWORD}If you want to contribute to the cdc-pipeline-generator library itself:
# Clone repository
git clone https://github.com/Relaxe111/cdc-pipeline-generator.git
cd cdc-pipeline-generator
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black .
ruff check .If you're using the library in your project, just install from PyPI as shown in Installation.