MetalLLM (macOS-first LLM Inference)

MetalLLM is an open-source Python library that brings large-context LLM inference to Apple Silicon Macs (M1–M4) using PyTorch MPS and future custom Metal kernels.

Goals

Run Llama, GPT-OSS, Qwen models with up to 100k context on macOS
No quantization required (fp16/bf16/float32)
Memory-aware planner to split KV cache across GPU/CPU/Disk
Stream-safe attention kernels (Metal) for long contexts
HuggingFace-like API and CoreML/Swift export for apps

Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -U pip setuptools
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers sentencepiece

from metal_llm import load

handle = load("meta-llama/Llama-2-7b-chat-hf", device="mps", dtype="float16")
out = handle.generate("Hello, summarize Metal for GPUs in 3 bullets.", max_new_tokens=64)
print(out)

Status

MVP works on MPS with a minimal streaming generate path
KV paging and Metal kernels in progress.

Future Updates planned

Paged KV cache with disk offload (100k+ tokens)
Flash-attention-like kernels in Metal (MSL)
Memory-aware execution planner and modes (tiny/balanced/high_throughput)
CoreML exporter + Swift package

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
metal_llm		metal_llm
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MetalLLM (macOS-first LLM Inference)

Goals

Quickstart

Status

Future Updates planned

License

About

Uh oh!

Releases

Packages

Languages

License

ApoorvBrooklyn/MetalLLM

Folders and files

Latest commit

History

Repository files navigation

MetalLLM (macOS-first LLM Inference)

Goals

Quickstart

Status

Future Updates planned

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages