Aleph

RKE2 + Warewulf + HAMi + KServe/Knative inference platform with OpenAI/Anthropic-compatible gateway.

Deployed on the University of Alberta / AMII Vulcan environment for multi-model GPU inference.

Maintained by: Rahim Khoja (khoja1@ualberta.ca) and Karim Ali (kali2@ualberta.ca)

📖 Description

Aleph is an inference platform repository for operating a Kubernetes-based model serving stack with fractional GPU scheduling.

It combines:

RKE2 for cluster runtime,
Warewulf for stateless node provisioning,
HAMi for vGPU slicing and scheduling,
KServe + Knative for model services and scale-to-zero,
Tyk OSS for key management and API gateway controls,
FastAPI gateway with OpenAI and Anthropic compatible endpoints.

The repo includes deployment manifests, model cards, gateway logic, storage configs, install scripts, and validation harnesses used to run the stack reproducibly.

✨ Features

OpenAI + Anthropic API compatibility in a single gateway
Fractional GPU scheduling with HAMi (nvidia.com/gpumem + nvidia.com/gpu)
Mixed model catalog: chat/reasoning, multimodal, embeddings, rerank, TTS, science
Scale-to-zero + warmup-aware testing for Knative-backed services
Tyk key management and model catalog API endpoints
NFS-backed model persistence with OneFS-safe mount options
Per-model manifests with cards, PVCs, and runtime-specific notes

🚀 Quickstart

1) Clone and set secrets

git clone git@github.com:ualberta-rcg/aleph.git
cd aleph
cp .env.example .env
# fill HF_TOKEN and TYK_SECRET/TYK_API_SECRET
set -a; source .env; set +a

2) Create HuggingFace token Secret

kubectl create secret generic hf-token -n models \
  --from-literal=token="$HF_TOKEN" \
  --dry-run=client -o yaml | kubectl apply -f -

3) Deploy gateway + models

# from a login node with sudo SSH access to control plane
./deploy.sh

# or pin a CI-built image tag
GATEWAY_IMAGE=rkhoja/aleph:gateway-<sha> ./deploy.sh

Gateway image is published to Docker Hub as rkhoja/aleph (see .github/workflows/deploy-gateway.yml).

4) Run full compatibility tests

python3 scratch/full_test.py

📚 Repository Layout

Path	Description
`gateway/`	FastAPI gateway app, translation logic, k8s deployment, Tyk API defs
`models/`	Per-model `InferenceService`, `PVC`, and `details.yaml` cards
`install-kubeflow/`	Cluster/platform install scripts and configs
`storage/`	StorageClass manifests (`nfs-client` default, tuning options)
`scratch/`	End-to-end tests and targeted validation scripts
`RUNBOOK.md`	Operational commands and troubleshooting runbook
`CHANGELOG.md`	Timeline of model/platform updates
`CLAUDE.md`	Operator and agent context for this repository

🧭 Operations Notes

Main working cluster documented here is control-plane 172.26.92.230 with HAMi-enabled GPU workers.
Node image build/publish source-of-truth is ualberta-rcg/warewulf-rke2-hami; this repo consumes that image line.
Keep secrets in .env only (gitignored); do not inline tokens in manifests.
Model-specific deployment guidance belongs in models/CLAUDE.md and optional models/<model>/CLAUDE.md.
Gateway-specific behavior notes belong in gateway/CLAUDE.md.

🔗 References

🤝 Support

Many Bothans died to bring us this information. This project is provided as-is, but reasonable questions may be answered based on my coffee intake or mood. ;)

Feel free to open an issue or email khoja1@ualberta.ca or kali2@ualberta.ca for U of A related deployments.

📜 License

This project is released under the MIT License - one of the most permissive open-source licenses available.

What this means:

✅ Use it for anything (personal, commercial, whatever)
✅ Modify it however you want
✅ Distribute it freely
✅ Include it in proprietary software

The only requirement: Keep the copyright notice somewhere in your project.

That's it! No other strings attached. The MIT License is trusted by major projects worldwide and removes virtually all legal barriers to using this code.

Full license text: MIT License

🧠 About University of Alberta Research Computing

The Research Computing Group supports high-performance computing, data-intensive research, and advanced infrastructure for researchers at the University of Alberta and across Canada.

We help design and operate compute environments that power innovation — from AI training clusters to national research infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aleph

📖 Description

✨ Features

🚀 Quickstart

1) Clone and set secrets

2) Create HuggingFace token Secret

3) Deploy gateway + models

4) Run full compatibility tests

📚 Repository Layout

🧭 Operations Notes

🔗 References

🤝 Support

📜 License

🧠 About University of Alberta Research Computing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
demo		demo
gateway		gateway
install-kubeflow		install-kubeflow
models		models
scratch		scratch
scripts		scripts
storage		storage
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CLUSTER-230-PLAN.md		CLUSTER-230-PLAN.md
GATEWAY-ARCHITECTURE.md		GATEWAY-ARCHITECTURE.md
GATEWAY-DESIGN.md		GATEWAY-DESIGN.md
LICENSE		LICENSE
README.md		README.md
RUNBOOK.md		RUNBOOK.md
deploy.sh		deploy.sh
model-usage.md		model-usage.md
models.md		models.md

Folders and files

Latest commit

History

Repository files navigation

Aleph

📖 Description

✨ Features

🚀 Quickstart

1) Clone and set secrets

2) Create HuggingFace token Secret

3) Deploy gateway + models

4) Run full compatibility tests

📚 Repository Layout

🧭 Operations Notes

🔗 References

🤝 Support

📜 License

🧠 About University of Alberta Research Computing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages