Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
Feb 23, 2025 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Turns Data and AI algorithms into production-ready web applications in no time.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The Data Engineering Cookbook
An orchestration platform for the development, production, and observation of data assets.
Always know what to expect from your data.
🐚 Python-powered shell. Full-featured and cross-platform.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The Open Source Feature Store for AI/ML
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Compare tables within or across databases
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Implementing best practices for PySpark ETL jobs and applications.
Python Stream Processing
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."