Skip to content

ualberta-rcg/aleph

Repository files navigation

Aleph

License: MIT Kubernetes GPU Scheduling Serving

RKE2 + Warewulf + HAMi + KServe/Knative inference platform with OpenAI/Anthropic-compatible gateway.

Deployed on the University of Alberta / AMII Vulcan environment for multi-model GPU inference.

Maintained by: Rahim Khoja (khoja1@ualberta.ca) and Karim Ali (kali2@ualberta.ca)


📖 Description

Aleph is an inference platform repository for operating a Kubernetes-based model serving stack with fractional GPU scheduling.

It combines:

  • RKE2 for cluster runtime,
  • Warewulf for stateless node provisioning,
  • HAMi for vGPU slicing and scheduling,
  • KServe + Knative for model services and scale-to-zero,
  • Tyk OSS for key management and API gateway controls,
  • FastAPI gateway with OpenAI and Anthropic compatible endpoints.

The repo includes deployment manifests, model cards, gateway logic, storage configs, install scripts, and validation harnesses used to run the stack reproducibly.

✨ Features

  • OpenAI + Anthropic API compatibility in a single gateway
  • Fractional GPU scheduling with HAMi (nvidia.com/gpumem + nvidia.com/gpu)
  • Mixed model catalog: chat/reasoning, multimodal, embeddings, rerank, TTS, science
  • Scale-to-zero + warmup-aware testing for Knative-backed services
  • Tyk key management and model catalog API endpoints
  • NFS-backed model persistence with OneFS-safe mount options
  • Per-model manifests with cards, PVCs, and runtime-specific notes

🚀 Quickstart

1) Clone and set secrets

git clone git@github.com:ualberta-rcg/aleph.git
cd aleph
cp .env.example .env
# fill HF_TOKEN and TYK_SECRET/TYK_API_SECRET
set -a; source .env; set +a

2) Create HuggingFace token Secret

kubectl create secret generic hf-token -n models \
  --from-literal=token="$HF_TOKEN" \
  --dry-run=client -o yaml | kubectl apply -f -

3) Deploy gateway + models

# from a login node with sudo SSH access to control plane
./deploy.sh

# or pin a CI-built image tag
GATEWAY_IMAGE=rkhoja/aleph:gateway-<sha> ./deploy.sh

Gateway image is published to Docker Hub as rkhoja/aleph (see .github/workflows/deploy-gateway.yml).

4) Run full compatibility tests

python3 scratch/full_test.py

📚 Repository Layout

Path Description
gateway/ FastAPI gateway app, translation logic, k8s deployment, Tyk API defs
models/ Per-model InferenceService, PVC, and details.yaml cards
install-kubeflow/ Cluster/platform install scripts and configs
storage/ StorageClass manifests (nfs-client default, tuning options)
scratch/ End-to-end tests and targeted validation scripts
RUNBOOK.md Operational commands and troubleshooting runbook
CHANGELOG.md Timeline of model/platform updates
CLAUDE.md Operator and agent context for this repository

🧭 Operations Notes

  • Main working cluster documented here is control-plane 172.26.92.230 with HAMi-enabled GPU workers.
  • Node image build/publish source-of-truth is ualberta-rcg/warewulf-rke2-hami; this repo consumes that image line.
  • Keep secrets in .env only (gitignored); do not inline tokens in manifests.
  • Model-specific deployment guidance belongs in models/CLAUDE.md and optional models/<model>/CLAUDE.md.
  • Gateway-specific behavior notes belong in gateway/CLAUDE.md.

🔗 References


🤝 Support

Many Bothans died to bring us this information. This project is provided as-is, but reasonable questions may be answered based on my coffee intake or mood. ;)

Feel free to open an issue or email khoja1@ualberta.ca or kali2@ualberta.ca for U of A related deployments.

📜 License

This project is released under the MIT License - one of the most permissive open-source licenses available.

What this means:

  • ✅ Use it for anything (personal, commercial, whatever)
  • ✅ Modify it however you want
  • ✅ Distribute it freely
  • ✅ Include it in proprietary software

The only requirement: Keep the copyright notice somewhere in your project.

That's it! No other strings attached. The MIT License is trusted by major projects worldwide and removes virtually all legal barriers to using this code.

Full license text: MIT License

🧠 About University of Alberta Research Computing

The Research Computing Group supports high-performance computing, data-intensive research, and advanced infrastructure for researchers at the University of Alberta and across Canada.

We help design and operate compute environments that power innovation — from AI training clusters to national research infrastructure.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors