lakehouse

Here are 96 public repositories matching this topic...

adidas / lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

framework big-data spark data-engineering databricks data-quality delta-lake great-expectations lakehouse configuration-driven

Updated Oct 7, 2025
Python

data-dot-all / dataall

Star

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws data-science data aws-s3 redshift etl-framework aws-glue aws-lake-formation lakehouse lakeformation

Updated Dec 23, 2025
Python

apache / doris-mcp-server

Star

Apache Doris MCP Server

real-time ai mcp olap query-engine lakehouse

Updated Dec 24, 2025
Python

google / space

Star

Unified storage framework for the entire machine learning lifecycle

machine-learning tensorflow dml data-warehouse dataset dataops olap ray apache-parquet apache-arrow multimodal multimodal-data tensorflow-dataset mlops lakehouse

Updated Mar 3, 2024
Python

mattiasthalen / adventure-works

Star

Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, event-enhanced Puppini bridges, and temporal resolution across DAS/DAB/DAR layers.

serverless data-warehouse data-engineering data-modeling iceberg data-architecture dimensional-modeling lakehouse duckdb unified-star-schema sqlmesh hook-methodology analytical-data-storage-system

Updated Mar 31, 2025
Python

abeltavares / batch-data-pipeline

Star

🦆 Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.

Updated Nov 5, 2025
Python

abeltavares / real-time-data-pipeline

Star

📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

docker open-source aws big-data etl s3 data-visualization data-engineering minio apache-flink apache-kafka real-time-data data-pipeline trino streaming-analytics apache-superset apache-iceberg lakehouse sql-analytics

Updated Jan 18, 2025
Python

ysfesr / Building-Data-LakeHouse

Star

Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data

docker spark presto hive minio s3-storage delta-lake lakehouse

Updated Dec 2, 2023
Python

lakevision-project / lakevision

Star

Lakevision is a tool which provides insights into your Apache Iceberg based Data Lakehouse.

aws-s3 apache svelte python3 daft iceberg carbon-design-system fast-api carbon-components-svelte lakehouse sveltekit datalakehouse pyiceberg mcp-server

Updated Dec 16, 2025
Python

factorhouse / examples

Star

Feature demos, integration guides & hands-on labs/projects using Kpow, Flex, Kafka, Flink, Iceberg & more

docker kubernetes demo tutorial flex kafka examples project quickstart flink iceberg datastreaming lakehouse kpow factorhouse

Updated Jan 5, 2026
Python

harrydevforlife / building-lakehouse

Star

Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.

python airflow spark s3 metabase minio dbt flask-api hive-metastore delta-lake lakehouse

Updated Dec 15, 2025
Python

Mmodarre / Lakehouse_Plumber

Star

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

python databricks dlt etl-framework metadata-driven pypi-package lakehouse delta-live-tables frameworke lakeflow-declarative-pipelines

Updated Jan 7, 2026
Python

mwc360 / LakeBench

Star

A multi-modal Python library for benchmarking lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks.

benchmark spark benchmark-framework daft lakehouse polars

Updated Dec 10, 2025
Python

jrlasak / databricks_apparel_streaming

Star

Databricks DLT Apparel Pipeline Project: Learn medallion architecture, streaming, and data engineering with Delta Live Tables. Includes synthetic data, step-by-step guide, and certification prep.

etl pyspark data-engineering learning-by-doing data-pipelines databricks dlt azure-databricks lakehouse delta-live-tables medallion-architecture

Updated Nov 4, 2025
Python

leehuwuj / olh

Sponsor

Star

Open source stack lakehouse

kubernetes spark bigdata dataplatform deltalake lakehouse

Updated Mar 2, 2024
Python

databricks-industry-solutions / omop-cdm

Star

Unlocking the Power of Health Data With a Modern Data Lakehouse

hls rwe lakehouse omop-cdm databricks-industry-solutions

Updated Jan 13, 2024
Python

databricks-industry-solutions / interop

Star

From FHIR ingestion to patient outcomes analysis

hls fhir lakehouse databricks-industry-solutions

Updated Dec 2, 2024
Python

xikitoptr / ELT_e-commerce

Star

This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The ficticious organization is an e-commerce company.

python sql snowflake dbt elt dataengineering fivetran lakehouse medallion-architecture

Updated Sep 30, 2024
Python

BauplanLabs / wap-with-bauplan-and-dbos

Star

Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS

python apache-iceberg lakehouse durable-execution dbos write-audit-publish bauplan

Updated Jan 8, 2025
Python

vinitg96 / elt-data-lakehouse

Star

Data Lakehouse moderno com MinIO, DuckDB, dbt, metabase e airflow

python airflow metabase data-engineering minio dbt lakehouse duckdb

Updated Nov 3, 2025
Python

Improve this page

Add a description, image, and links to the lakehouse topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lakehouse topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lakehouse

Here are 96 public repositories matching this topic...

adidas / lakehouse-engine

data-dot-all / dataall

apache / doris-mcp-server

google / space

mattiasthalen / adventure-works

abeltavares / batch-data-pipeline

abeltavares / real-time-data-pipeline

ysfesr / Building-Data-LakeHouse

lakevision-project / lakevision

factorhouse / examples

harrydevforlife / building-lakehouse

Mmodarre / Lakehouse_Plumber

mwc360 / LakeBench

jrlasak / databricks_apparel_streaming

leehuwuj / olh

databricks-industry-solutions / omop-cdm

databricks-industry-solutions / interop

xikitoptr / ELT_e-commerce

BauplanLabs / wap-with-bauplan-and-dbos

vinitg96 / elt-data-lakehouse

Improve this page

Add this topic to your repo