A GenAI-powered data lakehouse for NBA/WNBA stats. Your go-to for advanced hoops data!
Note: This project is currently under active development and is not yet functional. The infrastructure and core components are being built. Please check back for updates!
Per ADR-027, initial public access is provided via small, precomputed JSON artifacts served directly from S3. No auth required.
- player_daily: per-player daily metrics
- team_daily: per-team daily metrics
- top_lists: curated top metrics (e.g., top_ts, top_per, top_efg, top_net)
- index/latest.json: pointer to the most recent available dates
All artifacts are versioned (v1) and capped at ~100 KB for fast, low-cost access.
- Coverage: 2023-24 NBA season onwards
- Updates: Daily, 2โ4 hours after games complete
- Format: JSON artifacts under gold/served/
- Access: Public S3 with CORS (CDN optional)
Note: An MCP adapter may be added later as an optional layer. See meta/plans/v2-architecture-diagram.md.
Hoopstat Haus is an open-source project aimed at creating a comprehensive data lakehouse for basketball analytics. It ingests and processes NBA/WNBA statistics to provide deep insights for predictive modeling and powerful semantic search.
The core mission is to leverage modern data infrastructure and Generative AI to make advanced basketball analysis accessible and powerful.
This project is being built with a focus on robust, modern backend infrastructure:
- Language: Python
- Core Functionality: Data Ingestion, Processing, and Predictive Analytics
- Deployment: Fully automated via GitHub Actions
The repository has been seeded with foundational documents and architectural principles. The next phase of development will focus on building the core data ingestion pipelines.
The project is not operational at this time.
apps/ # Individual applications
libs/ # Shared Python libraries
infrastructure/ # Terraform AWS infrastructure (includes ECR)
docs-src/ # Documentation source (MkDocs with Material theme)
scripts/ # Utility scripts (ECR helper, etc.)
meta/ # Project metadata and ADRs
templates/ # Project templates
Key infrastructure components:
- AWS ECR: Container registry with automated CI/CD integration
- GitHub Actions: Automated testing, building, and deployment
- Terraform: Infrastructure as code for AWS resources
While the core infrastructure is being established, contributions are welcome in the form of ideas, feature requests, and bug reports. Please see our Contributing Guidelines for more details on how you can help shape the future of Hoopstat Haus.
To maintain code quality and reduce review cycles, please run local quality checks before submitting pull requests:
# For Python projects (apps and libs)
./scripts/local-ci-check.sh apps/your-app
./scripts/local-ci-check.sh libs/your-libOptional: Set up pre-commit hooks to automatically run quality checks:
pip install pre-commit
pre-commit installThis ensures your code passes the same checks that CI runs, catching formatting and linting issues early.
This project uses MkDocs with Material theme for documentation. All documentation is authored in docs-src/ and automatically published to GitHub Pages.
Local Documentation Development:
# Install documentation dependencies
pip install -r docs-requirements.txt
# Build documentation (includes API docs generation)
./scripts/build-docs.sh
# Serve documentation locally
mkdocs serveThe documentation site will be available at http://localhost:8000 for local preview.
Documentation Structure:
- Library API documentation is automatically generated from docstrings
- Development guides and ADRs are manually authored in
docs-src/ - Documentation is published to: https://efischer19.github.io/hoopstat-haus/