apache-iceberg

Here are 29 public repositories matching this topic...

abeltavares / real-time-data-pipeline

📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

docker open-source aws big-data etl s3 data-visualization data-engineering minio apache-flink apache-kafka real-time-data data-pipeline trino streaming-analytics apache-superset apache-iceberg lakehouse sql-analytics

Updated Jan 18, 2025
Python

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

apache-spark aws-athena aws-glue aws-dms apache-iceberg

Updated Feb 15, 2025
Python

guidok91 / spark-movies-etl

Star

Spark data pipeline that processes movie ratings data.

spark etl pyspark data-engineering elt data-pipeline apache-airflow uv apache-iceberg

Updated Aug 1, 2025
Python

aws-samples / monitoring-apache-iceberg-table-metadata-layer

Star

Sample code to collect Apache Iceberg metrics for table monitoring

aws apache-spark monitoring aws-lambda aws-cloudwatch data-quality aws-glue sam-cli apache-iceberg pyiceberg

Updated Aug 18, 2024
Python

aws-samples / aws-glue-streaming-etl-with-apache-iceberg

Star

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

apache-spark aws-athena aws-glue apache-iceberg aws-glue-streaming

Updated Sep 10, 2024
Python

aws-samples / aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg

Star

This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.

aws-s3 pyspark apache-kafka apache-iceberg aws-msk aws-glue-streaming aws-msk-serverless

Updated Sep 10, 2024
Python

guidok91 / spark-structured-streaming-kafka

Star

Spark Structured Streaming data pipeline that processes movie ratings data in real-time.

streaming real-time kafka spark apache-spark etl pyspark data-engineering apache-kafka spark-structured-streaming apache-iceberg

Updated Aug 1, 2025
Python

BauplanLabs / wap-with-bauplan-and-dbos

Star

Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS

python apache-iceberg lakehouse durable-execution dbos write-audit-publish bauplan

Updated Jan 8, 2025
Python

aws-samples / transactional-datalake-using-amazon-datafirehose-iceberg

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS

aws-athena aws-dms apache-iceberg aws-data-firehose

Updated Feb 15, 2025
Python

fraibacas / lakehouse-poc

Star

Run an open-source data LakeHouse locally using Docker Compose

docker-compose prefect apache-superset apache-iceberg lakehouse

Updated May 31, 2024
Python

BauplanLabs / data-agents-on-the-lakehouse

Star

Playground for running agentic workflows over a programmable warehouse

etl agents apache-iceberg lakehouse togetherai litellm write-audit-publish bauplan

Updated Jul 8, 2025
Python

aws-samples / transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)

kafka debezium apache-iceberg aws-msk msk-connect aws-glue-streaming aws-msk-serverless

Updated Feb 15, 2025
Python

JesuFemi-O / iceberg-integration-framework

Star

A poc open framework to manage data ingestion into apache iceberg tables

apache-iceberg lakehouse-platform pyiceberg

Updated Sep 11, 2024
Python

aws-samples / transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

mysql apache-spark kafka-connect debezium aws-athena apache-iceberg aws-glue-streaming

Updated Feb 15, 2025
Python

aws-samples / automation-of-building-a-transactional-data-lake

Star

apache-iceberg delta-lake apache-hudi transactional-data-lake

Updated Aug 28, 2024
Python

BauplanLabs / playlist-recomendations-with-bauplan-and-mongodb

Star

Reference implementation of embedding-based, sequential recommendations, using Bauplan (with Apache Iceberg + Apache Arrow) for data preparation and training, and MongoDB for serving real-time suggestions.

python mongodb serverless embeddings recsys apache-arrow vector-search apache-iceberg

Updated Dec 22, 2024
Python

datalpia / laketower

Star

Oversee your lakehouse

data sql apache-iceberg deltalake lakehouse

Updated Aug 19, 2025
Python

ev2900 / Iceberg_update_metadata_script

Star

Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)

python aws glue iceberg aws-glue apache-iceberg

Updated Aug 20, 2025
Python

mouadja02 / end2end-datawarehouse-project

Star

End-to-end data engineering pipeline with real-time streaming, cloud processing, and analytics. Built with Apache Kafka, Spark, AWS Glue, and Snowflake using Apache Iceberg tables.

aws apache-spark apache snowflake data-warehouse data-engineering apache-kafka data-pipelines data-streaming etl-pipeline apache-iceberg data-warehouse-architecture end-to-end-pipeline medallion-architecture

Updated Jun 26, 2025
Python

Elkoumy / real_time_data_lake

Star

🚀 Scalable near-real-time data pipeline using Apache Iceberg, Spark, Kafka, and Trino. ACID-compliant JSON ingestion, processing, and analytics. Dockerized for easy deployment. #DataEngineering #DataLake

docker kafka data-engineering data-lake real-time-analytics apache-iceberg data-lakehouse

Updated Apr 16, 2025
Python

Improve this page

Add a description, image, and links to the apache-iceberg topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-iceberg topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-iceberg

Here are 29 public repositories matching this topic...

abeltavares / real-time-data-pipeline

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

guidok91 / spark-movies-etl

aws-samples / monitoring-apache-iceberg-table-metadata-layer

aws-samples / aws-glue-streaming-etl-with-apache-iceberg

aws-samples / aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg

guidok91 / spark-structured-streaming-kafka

BauplanLabs / wap-with-bauplan-and-dbos

aws-samples / transactional-datalake-using-amazon-datafirehose-iceberg

fraibacas / lakehouse-poc

BauplanLabs / data-agents-on-the-lakehouse

aws-samples / transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue

JesuFemi-O / iceberg-integration-framework

aws-samples / transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

aws-samples / automation-of-building-a-transactional-data-lake

BauplanLabs / playlist-recomendations-with-bauplan-and-mongodb

datalpia / laketower

ev2900 / Iceberg_update_metadata_script

mouadja02 / end2end-datawarehouse-project

Elkoumy / real_time_data_lake

Improve this page

Add this topic to your repo