AI-Driven Analytics Engineering Platform (Archived)

This project is an archived reference implementation of a local-first data pipeline development platform with AI assistance and automated production promotion.

Repository Overview

This repository contains the source code and documentation for the AI-Driven Analytics Engineering Platform. The project's goal was to enable data engineers to develop, test, and deploy data transformation pipelines locally using DuckDB, and then promote validated changes to production environments like BigQuery or Snowflake.

The core workflow of the platform was:

Natural Language → AI-Generated dbt Models → Local Testing → Human Validation → Production Deployment

All the original documentation, specifications, and source code have been moved to the archive directory for reference.

Original Project Description

The AI-Driven Analytics Engineering Platform enabled data engineers to develop, test, and deploy data transformation pipelines locally using DuckDB, then promote validated changes to production BigQuery/Snowflake with full infrastructure as code.

Key Features

Local-First Development: Develop and test on DuckDB for fast, cost-effective iteration.
AI-Assisted Code Generation: Convert natural language descriptions into dbt models, tests, and documentation.
User-in-the-Loop Validation: Human review checkpoints at every critical stage.
Infrastructure as Code: Automated Terraform configuration generation.
Automated Production Promotion: One-command promotion from local to production.

Repository Structure

/archive: Contains all the original source code, documentation, scripts, and specifications.
README.md: This file.
.gitignore: Git ignore file.

Suggested Tags

data-engineering, ai, dbt, duckdb, bigquery, snowflake, iac, local-first, analytics-engineering, archived

Project Structure

services/analytics-engineering/
├── ai-engines/              # Python AI processing
│   ├── clients/            # Claude API integration
│   ├── dbt_generation/     # AI model generation
│   ├── validation/         # Code validation
│   ├── lineage/           # Data lineage analysis
│   └── deployment/        # IaC generation
│
├── orchestrator/           # TypeScript coordination
│   ├── src/
│   │   ├── agents/        # Multi-agent coordination
│   │   ├── dbt-interface/ # dbt Core integration
│   │   ├── duckdb-manager/# Local database
│   │   └── promotion/     # Production pipeline
│   └── tests/
│
├── local-environment/     # Development environment
│   ├── duckdb/           # Local databases
│   ├── dbt-project/      # dbt project
│   └── sample-data/      # Test datasets
│
├── infrastructure/        # Infrastructure as Code
│   ├── terraform/        # Cloud infrastructure
│   ├── dbt-profiles/     # Environment configs
│   └── ci-cd/            # GitHub Actions
│
└── tools/                # Utilities
    ├── sample-data/      # Data generation
    └── cli/              # Command-line tools

Technology Stack

Core

DuckDB: Local development database
dbt Core: Data transformation framework
Claude API: AI model generation
TypeScript 5.x: Orchestration layer
Python 3.11+: AI engines

Production

BigQuery/Snowflake: Production data warehouses
Terraform: Infrastructure as code
Airflow: Orchestration (optional)
dbt Cloud: Managed dbt (optional)

Development Workflow

1. Describe Transformation

npm run generate:dbt "Calculate monthly active users by cohort"

2. Review AI-Generated Code

-- AI generates dbt model
{{ config(materialized='table') }}

WITH user_activity AS (
  SELECT
    user_id,
    DATE_TRUNC('month', activity_date) AS activity_month,
    DATE_TRUNC('month', first_seen_date) AS cohort_month
  FROM {{ ref('user_events') }}
  WHERE event_type = 'active'
)

SELECT
  cohort_month,
  activity_month,
  COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY 1, 2

3. Test Locally

# Run on local DuckDB
npm run dbt:run --target local

# Validate results
npm run dbt:test --target local

4. Provide Feedback (if needed)

npm run review:feedback "Add cohort retention rate calculation"
# AI regenerates with improvements

5. Promote to Production

# Generate infrastructure configs
npm run generate:iac

# Deploy to production
npm run promote:prod

# Monitor deployment
npm run status

Configuration

Environment Variables

# AI Configuration
ANTHROPIC_API_KEY=sk-ant-...
AI_MODEL=claude-3-5-sonnet-20241022

# Local Development
DUCKDB_PATH=./local-environment/duckdb/analytics.db
DBT_PROFILES_DIR=./local-environment/dbt-project

# Production (BigQuery)
BIGQUERY_PROJECT=your-project-id
BIGQUERY_DATASET=analytics
GOOGLE_APPLICATION_CREDENTIALS=./credentials.json

# Production (Snowflake)
SNOWFLAKE_ACCOUNT=your-account
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_WAREHOUSE=COMPUTE_WH

dbt Profiles

analytics_platform:
  target: local

  outputs:
    local:
      type: duckdb
      path: ./local-environment/duckdb/analytics.db

    production_bq:
      type: bigquery
      project: "{{ env_var('BIGQUERY_PROJECT') }}"
      dataset: analytics
      method: service-account
      keyfile: "{{ env_var('GOOGLE_APPLICATION_CREDENTIALS') }}"

    production_sf:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      database: ANALYTICS
      warehouse: COMPUTE_WH
      schema: PUBLIC

Documentation

AI Analytics Pipeline Overview - Comprehensive project documentation
Feature Specification - User stories and requirements
Implementation Plan - Technical architecture
Data Model - Entity relationships
Quickstart Guide - Step-by-step tutorial

Success Criteria

Metric	Target	Status
Model generation time	<5 minutes	🚧 In Progress
Local-to-prod fidelity	95%+	⏳ Planned
Human review time	<10 minutes	⏳ Planned
Production promotion	<15 minutes	⏳ Planned
AI code pass rate	90%+	⏳ Planned
IaC generation	100% automated	⏳ Planned
Cycle time reduction	60%+	⏳ Planned

Contributing

This project follows SPEC-KIT methodology:

Create specification (spec.md)
Design implementation (plan.md)
Break down tasks (tasks.md)
Implement with tests
Validate against success criteria

Architecture Highlights

Why DuckDB?

Embedded analytics database (no server required)
Handles 1GB-100GB datasets efficiently
SQL dialect similar to BigQuery/Snowflake
Perfect for local development and testing

Why AI-Assisted?

Dramatically reduces time from idea to working code
Generates tests and documentation automatically
Learns from feedback to improve over time
Handles repetitive boilerplate work

Why User-in-the-Loop?

Ensures business logic accuracy
Builds trust in AI-generated code
Enables gradual adoption
Provides safety net before production

Roadmap

Phase 1: Local Development (Current)

✅ Project structure and architecture
🚧 AI-powered dbt model generation
🚧 Local DuckDB management
⏳ Validation gates

Phase 2: Production Promotion

⏳ Infrastructure as code generation
⏳ Deployment pipeline
⏳ Rollback capabilities

Phase 3: Optimization

⏳ Performance recommendations
⏳ Cost optimization
⏳ Data quality improvements

Phase 4: Modern Stack Integration

⏳ Airflow integration
⏳ dlt pipelines
⏳ dbt Cloud compatibility

License

MIT

Support

For questions or issues, see project documentation or create an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude-flow/metrics		.claude-flow/metrics
.claude/commands		.claude/commands
.hive-mind		.hive-mind
.specify		.specify
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
README.md		README.md
package-lock.json		package-lock.json
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AI-Driven Analytics Engineering Platform (Archived)

Repository Overview

Original Project Description

Key Features

Repository Structure

Suggested Tags

Project Structure

Technology Stack

Core

Production

Development Workflow

1. Describe Transformation

2. Review AI-Generated Code

3. Test Locally

4. Provide Feedback (if needed)

5. Promote to Production

Configuration

Environment Variables

dbt Profiles

Documentation

Success Criteria

Contributing

Architecture Highlights

Why DuckDB?

Why AI-Assisted?

Why User-in-the-Loop?

Roadmap

Phase 1: Local Development (Current)

Phase 2: Production Promotion

Phase 3: Optimization

Phase 4: Modern Stack Integration

License

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages