Skip to content
@sbdk-dev

sbdk-dev

SBDK: Local-First Data & AI Development Tools

Build and test complete data pipelines in 30 seconds. Zero cloud setup, zero Docker, zero cost.

Five production-ready reference implementations demonstrating how to build local-first data and AI tools—from pipeline sandboxes to ML-in-SQL to conversational analytics.


The Problem We Solve

Traditional data pipeline development is slow and expensive:

  • Setting up a dev environment takes days (Docker, cloud accounts, configuration)
  • Testing requires deploying to cloud infrastructure ($$$)
  • Iteration cycles are painfully slow (push → wait → test → repeat)
  • Breaking production is expensive and stressful

SBDK tools run everything locally:

  • Full dev environment in 30 seconds (1 command)
  • Test everything safely on your laptop (zero cost)
  • Instant iteration cycles (30-second feedback loops)
  • Production patterns validated before deployment

Who Should Use These?

🛠️ Data Engineers

Testing dbt models and data pipelines without cloud infrastructure

Use SBDK.dev to get instant local DuckDB + dbt + DLT environment, test transformations, iterate fast

🏗️ Platform Engineers

Building data tools and evaluating infrastructure patterns

Study the codebases to see professional CLI architecture, MCP server patterns, exception handling, testing frameworks

📚 Data Engineering Students

Learning modern data stack without deployment complexity

Run working examples of dbt transformations, DuckDB queries, Rust extensions, AI integrations—all on your laptop


The 5 Projects

Core Foundation

1. 🏗️ SBDK.dev - Local Pipeline Sandbox

Get a complete data pipeline running in 30 seconds | Python | Active

A local development sandbox giving you DuckDB + dbt + DLT in 1 command. No Docker, no cloud, no configuration.

pip install sbdk-dev
sbdk init my_project && cd my_project
sbdk run  # Data generation → ingestion → transformation
sbdk query "SELECT * FROM orders_daily LIMIT 10"

Solves: Days of environment setup → 30 seconds. Cloud testing costs → zero. Slow iteration → instant feedback.

Try SBDK.dev

Extensions & Enhancements

2. 🧠 Mallard (local-inference) - ML in SQL

Run ML models directly in your database—no separate infrastructure | Rust | Archived

DuckDB extension for zero-shot predictions, embeddings, and feature importance. Write SQL, get ML.

-- Run zero-shot classification in SQL
SELECT predict_category(description) as category FROM products;

-- Generate embeddings
SELECT embed_text(content) as vector FROM documents;

Solves: Separate ML infrastructure → All in SQL. Model training complexity → Zero-shot inference. Python overhead → Rust performance.

Explore Mallard

3. 🔍 Semantic Tracer - Lineage Visualization

Understand complex dbt projects with interactive graphs | Rust + TypeScript | Archived

Desktop app visualizing dbt semantic layers. See how your metrics, dimensions, and entities connect.

  • Interactive lineage graphs (React Flow)
  • Direct semantic_models.yml integration
  • Tauri desktop app (fast Rust backend)

Solves: Complex dbt projects → Visual understanding. Scattered docs → Interactive exploration. Cloud tools → Local desktop app.

Explore Semantic Tracer

4. 💬 Local AI Analyst - Conversational Analytics

Ask data questions in natural language—with statistical rigor | Python | Archived

AI analyst that runs real queries first, then explains results. No hallucination—just actual data with confidence intervals.

  • Natural language → SQL → Results → Statistical analysis
  • Execution-first (prevents AI making up answers)
  • Automatic significance testing, confidence intervals

Solves: AI hallucination → Execution-first validation. Unreliable insights → Statistical rigor. SQL expertise needed → Natural language queries.

Explore Local AI Analyst

5. 🔌 knowDB - AI Assistant Integration

Query your data through Claude Desktop or ChatGPT | Python | Archived

MCP server connecting local data to AI assistants. Ask questions in Claude Desktop, get real query results.

  • MCP (Model Context Protocol) server implementation
  • Works with Claude Desktop, ChatGPT Desktop, any MCP client
  • Auto-sync dbt semantic layer

Solves: Separate tools for data/AI → Unified interface. Complex queries → Natural language. Context switching → Query from chat.

Explore knowDB

Documentation Hub

6. 🌐 sbdk.dev - This Website

Central hub with architecture guides and getting started | Next.js | Active

Visit sbdk.dev | View Source


What You Get From These Projects

Complete working code (not tutorials):

  • ✅ Run everything locally—no Docker, no cloud accounts
  • ✅ See how DLT, dbt, DuckDB, Rust, and MCP actually fit together
  • ✅ Production patterns you can adapt (CLI architecture, exception handling, testing)
  • ✅ MIT licensed—fork and use however you want

Technologies & patterns demonstrated:

  • Local-first data pipelines: DuckDB + dbt + DLT running on your laptop
  • Professional CLI design: Typer + Rich + Pydantic with exception hierarchies
  • Rust database extensions: High-performance DuckDB extensions
  • MCP server patterns: Connect data tools to AI assistants
  • Desktop apps with Tauri: Rust backend + React frontend
  • Statistical rigor: Execution-first AI to prevent hallucination

🚀 Getting Started

Quick Start with SBDK.dev

git clone https://github.com/sbdk-dev/sbdk-dev
cd sbdk-dev
pip install -e .
sbdk init my-project

Pick Your Project

Learn the Patterns

All projects include complete documentation, real-world examples, and comprehensive test coverage—perfect for learning modern data engineering and local-first development.


Why Archived?

These are complete, stable reference implementations—not active products. They're archived because they're done: production-quality code demonstrating proven patterns.

Perfect for:

  • Forking and adapting for your own projects
  • Learning from real, working code (not tutorials)
  • Understanding how modern data tools fit together

🚀 Quick Start

# Start with the foundation
git clone https://github.com/sbdk-dev/sbdk-dev
cd sbdk-dev
pip install -e .
sbdk init my-project

Or pick a specific project:


📚 Learn More

→ Visit sbdk.dev for architecture diagrams, use cases, and getting started guides

→ Browse all repositories to explore individual projects


MIT Licensed • Open Source • Archived Nov 2025 as reference implementations


Popular repositories Loading

  1. sbdk-dev sbdk-dev Public archive

    SBDK is the first local-first data pipeline toolkit that gives you enterprise-grade data processing with zero cloud dependencies. Built on modern Python foundations with DLT, DuckDB, and dbt.

    Python 2

  2. sbdk.dev sbdk.dev Public

    A complete reference implementation of a local-first ecosystem for AI-powered analytics. This repository contains the source code for the SBDK.dev website, the central hub for the SBDK suite of ope…

    TypeScript 1

  3. local-inference local-inference Public archive

    A reference implementation of a local-first, AI-powered semantic layer built on DuckDB and Rust. Run zero-shot tabular predictions directly in SQL, providing an open-source blueprint for in-databas…

    C++

  4. knowDB knowDB Public archive

    KnowDB is an AI semantic layer that extends sbdk-dev to enable natural language queries against your data through AI assistants like Claude Desktop and ChatGPT Desktop via the Model Context Protoco…

    Python

  5. .github .github Public archive

    Website and central hub for the SBDK ecosystem - five open-source reference implementations demonstrating how to build local-first data and AI tools (data pipelines, ML inference, semantic layers, …

  6. local-ai-analyst local-ai-analyst Public archive

    A reference implementation of a local-first AI-powered analytics assistant, demonstrating best practices for MCP integration, dbt semantic bridging, and extending data pipeline infrastructure using…

    Python

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…