RKE2 + Warewulf + HAMi + KServe/Knative inference platform with OpenAI/Anthropic-compatible gateway.
Deployed on the University of Alberta / AMII Vulcan environment for multi-model GPU inference.
Maintained by: Rahim Khoja (khoja1@ualberta.ca) and Karim Ali (kali2@ualberta.ca)
Aleph is an inference platform repository for operating a Kubernetes-based model serving stack with fractional GPU scheduling.
It combines:
- RKE2 for cluster runtime,
- Warewulf for stateless node provisioning,
- HAMi for vGPU slicing and scheduling,
- KServe + Knative for model services and scale-to-zero,
- Tyk OSS for key management and API gateway controls,
- FastAPI gateway with OpenAI and Anthropic compatible endpoints.
The repo includes deployment manifests, model cards, gateway logic, storage configs, install scripts, and validation harnesses used to run the stack reproducibly.
- OpenAI + Anthropic API compatibility in a single gateway
- Fractional GPU scheduling with HAMi (
nvidia.com/gpumem+nvidia.com/gpu) - Mixed model catalog: chat/reasoning, multimodal, embeddings, rerank, TTS, science
- Scale-to-zero + warmup-aware testing for Knative-backed services
- Tyk key management and model catalog API endpoints
- NFS-backed model persistence with OneFS-safe mount options
- Per-model manifests with cards, PVCs, and runtime-specific notes
git clone git@github.com:ualberta-rcg/aleph.git
cd aleph
cp .env.example .env
# fill HF_TOKEN and TYK_SECRET/TYK_API_SECRET
set -a; source .env; set +akubectl create secret generic hf-token -n models \
--from-literal=token="$HF_TOKEN" \
--dry-run=client -o yaml | kubectl apply -f -# from a login node with sudo SSH access to control plane
./deploy.sh
# or pin a CI-built image tag
GATEWAY_IMAGE=rkhoja/aleph:gateway-<sha> ./deploy.shGateway image is published to Docker Hub as rkhoja/aleph (see .github/workflows/deploy-gateway.yml).
python3 scratch/full_test.py| Path | Description |
|---|---|
gateway/ |
FastAPI gateway app, translation logic, k8s deployment, Tyk API defs |
models/ |
Per-model InferenceService, PVC, and details.yaml cards |
install-kubeflow/ |
Cluster/platform install scripts and configs |
storage/ |
StorageClass manifests (nfs-client default, tuning options) |
scratch/ |
End-to-end tests and targeted validation scripts |
RUNBOOK.md |
Operational commands and troubleshooting runbook |
CHANGELOG.md |
Timeline of model/platform updates |
CLAUDE.md |
Operator and agent context for this repository |
- Main working cluster documented here is control-plane
172.26.92.230with HAMi-enabled GPU workers. - Node image build/publish source-of-truth is
ualberta-rcg/warewulf-rke2-hami; this repo consumes that image line. - Keep secrets in
.envonly (gitignored); do not inline tokens in manifests. - Model-specific deployment guidance belongs in
models/CLAUDE.mdand optionalmodels/<model>/CLAUDE.md. - Gateway-specific behavior notes belong in
gateway/CLAUDE.md.
- University of Alberta Research Computing
- Alberta Machine Intelligence Institute (AMII)
- Digital Research Alliance of Canada
- HAMi
- Warewulf RKE2 HAMi Image Repo
- KServe
- Knative
- RKE2
Many Bothans died to bring us this information. This project is provided as-is, but reasonable questions may be answered based on my coffee intake or mood. ;)
Feel free to open an issue or email khoja1@ualberta.ca or kali2@ualberta.ca for U of A related deployments.
This project is released under the MIT License - one of the most permissive open-source licenses available.
What this means:
- ✅ Use it for anything (personal, commercial, whatever)
- ✅ Modify it however you want
- ✅ Distribute it freely
- ✅ Include it in proprietary software
The only requirement: Keep the copyright notice somewhere in your project.
That's it! No other strings attached. The MIT License is trusted by major projects worldwide and removes virtually all legal barriers to using this code.
Full license text: MIT License
The Research Computing Group supports high-performance computing, data-intensive research, and advanced infrastructure for researchers at the University of Alberta and across Canada.
We help design and operate compute environments that power innovation — from AI training clusters to national research infrastructure.