📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
-
Updated
Jan 18, 2025 - Python
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Spark data pipeline that processes movie ratings data.
Sample code to collect Apache Iceberg metrics for table monitoring
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS
Run an open-source data LakeHouse locally using Docker Compose
Playground for running agentic workflows over a programmable warehouse
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
A poc open framework to manage data ingestion into apache iceberg tables
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
Reference implementation of embedding-based, sequential recommendations, using Bauplan (with Apache Iceberg + Apache Arrow) for data preparation and training, and MongoDB for serving real-time suggestions.
End-to-end data engineering pipeline with real-time streaming, cloud processing, and analytics. Built with Apache Kafka, Spark, AWS Glue, and Snowflake using Apache Iceberg tables.
🚀 Scalable near-real-time data pipeline using Apache Iceberg, Spark, Kafka, and Trino. ACID-compliant JSON ingestion, processing, and analytics. Dockerized for easy deployment. #DataEngineering #DataLake
Add a description, image, and links to the apache-iceberg topic page so that developers can more easily learn about it.
To associate your repository with the apache-iceberg topic, visit your repo's landing page and select "manage topics."