Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
Nov 27, 2025 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Turns Data and AI algorithms into production-ready web applications in no time.
The Data Engineering Cookbook
An orchestration platform for the development, production, and observation of data assets.
Always know what to expect from your data.
Python-powered shell. Full-featured and cross-platform.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The Open Source Feature Store for AI/ML
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Compare tables within or across databases
DeepAnalyze is the first agentic LLM for autonomous data science. 🎈你的AI数据分析师,自动分析大量数据,一键生成专业分析报告!
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Implementing best practices for PySpark ETL jobs and applications.
Python Stream Processing
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."