Time-series data warehouse built for speed. 2.42M records/sec on local NVMe. DuckDB + Parquet + Arrow + flexible storage (local/MinIO/S3). AGPL-3.0
-
Updated
Oct 21, 2025 - Python
Time-series data warehouse built for speed. 2.42M records/sec on local NVMe. DuckDB + Parquet + Arrow + flexible storage (local/MinIO/S3). AGPL-3.0
Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
OpenChatBI is an intelligent chat-based BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. It uses LangGraph and LangChain to build chat agent and workflows that support natural language to SQL conversion and data analysis.
implementing an end-to-end tweets ETL/Analysis pipeline.
End to end data engineering project
Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
A library to accelerate ML and ETL pipeline by connecting all data sources
A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.
后端学习笔记,本项目存放了一些我阅读有关的技术类的书籍和部分源码阅读的笔记整理。 涉及范围包括后端开发中的计算机学科基础知识、高级语言的基础知识、源码阅读笔记、数据库知识、数据挖掘知识等,同时也会涉及到一些具体生产场景中会遇到的一些实际问题。 :-D
This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite.
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
This project promulgates an automated end-to-end ML pipeline that trains a biLSTM network for sentiment analysis, experiment tracking, benchmarking by model testing and evaluation, model transitioning to production followed by deployment into cloud instance via CI/CD
🌠 Hephaestus - ETL and ML tools for OHDSI - OMOP CDM
open-source ETL pipeline for HEX cryptocurrency data
Data warehousing date dimension and time dimension builders written in Python.
An ETL Data Pipelines Project that uses AirFlow DAGs to extract employees' data from PostgreSQL Schemas, load it in AWS Data Lake, Transform it with Python script, and Finally load it into SnowFlake Data warehouse using SCD type 2.
Demonstrating USS data modeling with dbt and Rill.
Add a description, image, and links to the datawarehouse topic page so that developers can more easily learn about it.
To associate your repository with the datawarehouse topic, visit your repo's landing page and select "manage topics."