Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.
-
Updated
May 31, 2026 - Python
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
CLI task management & automation tool
Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.
Example end to end data engineering project.
Smarter data pipelines for audio.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Executable memory system for tabular data work
Code review for data in dbt
Streaming reactive and dataflow graphs in Python
Code for "Efficient Data Processing in Spark" Course
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
Fluent data pipelines for python and your shell
Tools for ASR Corpus Generation from Online Video
Build and deploy a serverless data pipeline on AWS with no effort.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."