Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
Feb 24, 2025 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Efficient data transformation and modeling framework that is backwards compatible with dbt.
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Implementing best practices for PySpark ETL jobs and applications.
A Python stream processing engine modeled after Yahoo! Pipes
Postgres to Elasticsearch/OpenSearch sync
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Add a description, image, and links to the etl topic page so that developers can more easily learn about it.
To associate your repository with the etl topic, visit your repo's landing page and select "manage topics."