Skip to content

A GenAI-powered data lakehouse for NBA/WNBA stats. Ingests, processes, and provides insights for predictive analytics and semantic search. Built with Python, robust backend infra, and deployed via GH Actions. Your go-to for advanced hoops data!

License

Notifications You must be signed in to change notification settings

efischer19/hoopstat-haus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Hoopstat Haus ๐Ÿ€

Status: WIP

A GenAI-powered data lakehouse for NBA/WNBA stats. Your go-to for advanced hoops data!


Note: This project is currently under active development and is not yet functional. The infrastructure and core components are being built. Please check back for updates!

๐Ÿš€ Quick Start: Access Basketball Analytics (Stateless JSON)

Per ADR-027, initial public access is provided via small, precomputed JSON artifacts served directly from S3. No auth required.

Whatโ€™s available

  • player_daily: per-player daily metrics
  • team_daily: per-team daily metrics
  • top_lists: curated top metrics (e.g., top_ts, top_per, top_efg, top_net)
  • index/latest.json: pointer to the most recent available dates

All artifacts are versioned (v1) and capped at ~100 KB for fast, low-cost access.

๐Ÿ“Š Data Availability

  • Coverage: 2023-24 NBA season onwards
  • Updates: Daily, 2โ€“4 hours after games complete
  • Format: JSON artifacts under gold/served/
  • Access: Public S3 with CORS (CDN optional)

Note: An MCP adapter may be added later as an optional layer. See meta/plans/v2-architecture-diagram.md.

About The Project

Hoopstat Haus is an open-source project aimed at creating a comprehensive data lakehouse for basketball analytics. It ingests and processes NBA/WNBA statistics to provide deep insights for predictive modeling and powerful semantic search.

The core mission is to leverage modern data infrastructure and Generative AI to make advanced basketball analysis accessible and powerful.

Tech Stack

This project is being built with a focus on robust, modern backend infrastructure:

  • Language: Python
  • Core Functionality: Data Ingestion, Processing, and Predictive Analytics
  • Deployment: Fully automated via GitHub Actions

Current Status

The repository has been seeded with foundational documents and architectural principles. The next phase of development will focus on building the core data ingestion pipelines.

The project is not operational at this time.

Repository Structure

apps/           # Individual applications
libs/           # Shared Python libraries  
infrastructure/ # Terraform AWS infrastructure (includes ECR)
docs-src/       # Documentation source (MkDocs with Material theme)
scripts/        # Utility scripts (ECR helper, etc.)
meta/           # Project metadata and ADRs
templates/      # Project templates

Key infrastructure components:

  • AWS ECR: Container registry with automated CI/CD integration
  • GitHub Actions: Automated testing, building, and deployment
  • Terraform: Infrastructure as code for AWS resources

Contributing

While the core infrastructure is being established, contributions are welcome in the form of ideas, feature requests, and bug reports. Please see our Contributing Guidelines for more details on how you can help shape the future of Hoopstat Haus.

Quality Assurance for Contributors

To maintain code quality and reduce review cycles, please run local quality checks before submitting pull requests:

# For Python projects (apps and libs)
./scripts/local-ci-check.sh apps/your-app
./scripts/local-ci-check.sh libs/your-lib

Optional: Set up pre-commit hooks to automatically run quality checks:

pip install pre-commit
pre-commit install

This ensures your code passes the same checks that CI runs, catching formatting and linting issues early.

Documentation

This project uses MkDocs with Material theme for documentation. All documentation is authored in docs-src/ and automatically published to GitHub Pages.

Local Documentation Development:

# Install documentation dependencies
pip install -r docs-requirements.txt

# Build documentation (includes API docs generation)
./scripts/build-docs.sh

# Serve documentation locally
mkdocs serve

The documentation site will be available at http://localhost:8000 for local preview.

Documentation Structure:


About

A GenAI-powered data lakehouse for NBA/WNBA stats. Ingests, processes, and provides insights for predictive analytics and semantic search. Built with Python, robust backend infra, and deployed via GH Actions. Your go-to for advanced hoops data!

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks