 
Docs · Report Bug · Roadmap · Get Help · Watch Demo
GlassFlow is an open-source ETL tool that enables real-time data processing from Kafka to ClickHouse. GlassFlow pipelines can perform the following operations:
- Deduplicate: Remove duplicate records based on configurable keys and time windows - use when you need to ensure data uniqueness
- Join: Perform temporal joins between multiple Kafka topics - use when combining related data streams with time-based matching
- Deduplicate & Join: Combine both deduplication and joining in a single pipeline
- Ingest only: Direct data transfer from Kafka to ClickHouse without transformations
This guide walks you through a local installation using Docker Compose — perfect for development, testing, or trying out GlassFlow on your machine.
Explore more demos and building pipeline via UI in our docs. To start creating your own pipelines, follow the Usage Guide
- Clone the repository:
git clone https://github.com/glassflow/clickhouse-etl.git
cd clickhouse-etl- Go to the demo folder and start the services
cd demos
docker compose up -dThis will start GlassFlow, Kafka and Clickhouse inside of docker.
- Once the services are up, run the demo script which will create a topic in kafka, a table in clickhouse and setup a pipeline on glassflow. Since the script is in python, you will need python installed with the needed dependencies.
python3 -m venv venv
pip install -r requirements.txt python demo_deduplication.py --num-records 10000 --duplication-rate 0.1This will send 10000 records to the kafka topic (with 10% duplicates).
- 
Access the web interface at http://localhost:8080to view the demo pipeline.
- 
View the logs: 
# Follow logs in real-time for all containers
docker compose logs -f
# logs for the backend api
docker compose logs api -f
# logs for the UI
docker compose logs ui -fGlassFlow is open source and can be self-hosted on Kubernetes. GlassFlow works with any managed Kubernetes services like AWS EKS, GKE, AKS, and more. For local testing or a small POC, you can also use Docker and Docker Compose to run GlassFlow on your local machine.
| Method | Use Case | Docs Link | 
|---|---|---|
| ☸️ Kubernetes with Helm | Kubernetes deployment | Kubernetes Helm Guide | 
| 🐳 Local with Docker Compose | Quick evaluation and local testing | Local Docker Guide | 
| ☁️ AWS EC2 with Docker Compose | Lightweight cloud deployment for testing | AWS EC2 Guide | 
Log in and see a working demo of GlassFlow running on a GPC cluster at demo.glassflow.dev. You will see a Grafana dashboard and the setup that we used.
GlassFlow Pipeline showing real-time streaming from Kafka through GlassFlow to ClickHouse
For detailed documentation, visit docs.glassflow.dev. The documentation includes:
Check out our public roadmap to see what's coming next in GlassFlow. We're actively working on new features and improvements based on community feedback.
Want to suggest a feature? We'd love to hear from you! Please use our GitHub Discussions to share your ideas and help shape the future of GlassFlow.
- Streaming deduplication and joins for up to 7d through an inbuilt state store
- ClickHouse sink with a native protocol for high performance
- Built-in Kafka connector with SASL, SSL, etc. for nearly all Kafka providers
- Dead-Letter Queue for handling failed events
- Field mapping of your Kafka table to ClickHouse
- Prometheus metrics and OpenTelemetry logs for comprehensive observability
This project is licensed under the Apache License 2.0.

