Skip to content

blake-hamm/bhamm-lab

Repository files navigation

bhamm-lab

GitHub stars License

Welcome to my personal lab for exploring AI/ML, DevOps, and security. I've built a resilient, open-source platform by combining bare-metal servers, virtualization, and container orchestration. It's a place for learning, tinkering, and maybe over-engineer a solution or two.

Guiding Principles

This project is, first and foremost, a platform for learning and exploration. The core philosophy is to maintain a resilient and reproducible test environment where experimentation is encouraged. While this approach can sometimes lead to over-engineering (here's the counter-argument), the primary goal is to guarantee that any component can be rebuilt from code.

This philosophy is supported by several key principles:

  • Everything as Code: All infrastructure, from bare-metal provisioning to application deployment, is defined declaratively and managed through version control. This ensures consistency and enables rapid disaster recovery.
  • Monorepo Simplicity: The entire homelab is managed within a single repository, providing a unified view of all services, configurations, and documentation.
  • Open Source First: I prioritize the use of open-source software to maintain flexibility and support the community.
  • Accelerated AI/ML: The environment is specifically tailored for AI/ML workloads, with a focus on leveraging AMD and Intel GPU acceleration for inference.

Core Infrastructure

Hardware:

  • Servers: 5 servers – ‘Method’ (SuperMicro H12SSL‑i), ‘Indy’ (SuperMicro D‑2146NT), ‘Stale’ (X10SDV‑4C‑TLN4F), ‘Nose’ & ‘Tail’ (Framework Mainboard)
  • Networking: TP‑Link Omada switches & Protectli Opnsense firewall
  • Accelerated compute: Intel Arc A310, AMD Radeon AI Pro R9700, AMD Ryzen AI MAX+ 395 “Strix Halo”
  • Management: UPS, PiKVM

Software Stack:

Key Features

AI/ML Capabilities:

  • 🤖 Managing device through Intel GPU plugin and AMD ROCm operator
  • 🖼️ Immich machine learning & Jellyfin transcoding with Intel Arc A310
  • 📦 llm-models Helm chartKubeElasti scale‑to‑zero Llama.cpp inference routed through LiteLLM
  • 🧠 Embedding model inference with AMD Radeon AI Pro R9700
  • ⚡ Dense & MoE inference on two AMD Ryzen AI MAX+ 395
  • ☁️ GCP Vertex AI for larger ML inference

Automation:

Storage & Backups:

Security:

Disaster Recovery:

  • Infrastructure-as-Code for rapid rebuilding
  • Automated backup restoration workflows and gitops
  • Regular disaster recovery testing with blue/green cluster
  • 3-2-1 backup strategy

Documentation

Comprehensive documentation is available in the Docker Docs Site directory, covering architecture, deployments, operations, security, and AI/ML implementations.

Roadmap

Github issues are more up to date.