Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
-
Updated
Jun 1, 2026 - Java
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.
Flink CDC is a streaming data integration tool
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Data Pipeline Automation Framework to build MCP servers, data APIs, and data lakes with SQL.
By Smart Shaped s.r.l. (https://www.smartshaped.com/)
Kafka Streams made easy with a YAML file
cron replacement to schedule complex data workflows
Data pipeline using Apache Kafka, Apache Spark and HDFS
Library for describing data transformation pipelines by compositing simple reusable components.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A composable, stage-based streaming ETL framework with built-in violation detection, delta analysis, and data lake layer processing. Designed for compliance monitoring, data quality enforcement, and supply chain traceability.
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
A real-time data pipeline using Kafka, Spark, and Cassandra for processing and storing credit card expenses. Includes a Spring Boot application for retrieving personnel data from MySQL, storing images in S3, and displaying employee details with expense reports on a web interface.
An end to end data pipeline with Kafka Spark Streaming Integration
Data-processing and common libraries used in main project, all available under Apache 2.0
A source-available JVM pipeline kernel for policy-aware, benchmarkable operational data movement.
This is the graduation project for DEPI internship.
A real-time cryptocurrency data streaming pipeline.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."