YiRage (Yield Revolutionary AGile Engine) extends Mirage with comprehensive multi-backend support, enabling LLM inference optimization across diverse hardware platforms.
- Original Mirage (CMU): Superoptimizer framework for tensor programs
- YiRage Extensions (Chen Xingqiang, 2025): Multi-backend support with hardware-aware optimizations
Layer 1: Python API
- Backend query and selection
- Kernel graph creation
- Hardware-specific optimizers
- Search strategy access
Layer 2: Backend Manager (C++)
- BackendRegistry (singleton, thread-safe)
- Factory patterns for backends and strategies
- Automatic initialization on import
Layer 3: Backend Implementations
- 7 complete backends with hardware-specific optimizations
- Each backend includes optimizer and search strategy
- Direct hardware mapping for maximum performance
| Backend | Hardware | Key Features | Status |
|---|---|---|---|
| CUDA | NVIDIA GPU | Tensor Core, Warp, Bank Conflict Avoidance | ✅ |
| CPU | x86/ARM | SIMD (AVX512), Cache Blocking, OpenMP | ✅ |
| MPS | Apple Silicon | Metal, Threadgroup, Unified Memory | ✅ |
| Triton | Compiler | Auto-tuning, Pipelining, Split-K | ✅ |
| NKI | AWS Neuron | SBUF, DMA, BF16 Native | ✅ |
| cuDNN | CUDA Accel | Algorithm Selection, Tensor Op | ✅ |
| MKL | Intel Accel | Threading, BLAS, Fast MM | ✅ |
- 42+ Optimization Methods across all backends
- Automatic Configuration based on hardware capabilities
- Performance Modeling for each backend
from yirage.kernel.cuda import CUDAOptimizer, CUDAKernelConfig
config = CUDAKernelConfig()
CUDAOptimizer.optimize_grid_block_dims(1024, 1024, 1024,
compute_capability=80,
config=config)
# Auto-configured: Tensor Core, Warps, Shared Memory, Occupancyfrom yirage.kernel.mps import MPSOptimizer, MPSKernelConfig
config = MPSKernelConfig()
MPSOptimizer.optimize_for_apple_silicon(1024, 1024, 1024, config)
# Auto-detects: M1/M2/M3, GPU cores, Threadgroup size- 5 Independent Search Strategies with hardware-specific optimization
- 15 Candidate Generation Dimensions
- 13 Performance Evaluation Metrics
- Auto-tuning and performance modeling
# From GitHub
git clone https://github.com/chenxingqiang/YiRage.git
cd YiRage
pip install -e .
export YIRAGE_HOME=$(pwd)import yirage as yr
# Query available backends
backends = yr.get_available_backends()
print(f"Available backends: {backends}")
# Output: ['cuda', 'cpu', 'mps'] # depends on your hardware
# Check specific backend
if yr.is_backend_available('mps'):
print("Apple Silicon GPU ready!")
# Create kernel with backend selection
mpk = yr.PersistentKernel(
mode="decode",
backend="mps", # Specify backend
fallback_backends=["cpu"], # Auto fallback
world_size=1,
mpi_rank=0,
# ... other parameters
)# CUDA optimization
from yirage.kernel.cuda import CUDAOptimizer, CUDAKernelConfig
cuda_config = CUDAKernelConfig()
CUDAOptimizer.optimize_grid_block_dims(m=1024, n=1024, k=1024,
compute_capability=80,
config=cuda_config)
# CPU optimization
from yirage.kernel.cpu import CPUOptimizer, CPUKernelConfig
cpu_config = CPUKernelConfig()
CPUOptimizer.optimize_for_cpu(m=1024, n=1024, k=1024, config=cpu_config)
# Auto-detects: SIMD type, CPU cores, cache sizes
# MPS optimization (Apple Silicon)
from yirage.kernel.mps import MPSOptimizer, MPSKernelConfig
mps_config = MPSKernelConfig()
MPSOptimizer.optimize_for_apple_silicon(m=1024, n=1024, k=1024, config=mps_config)
# Auto-detects: GPU family (M1/M2/M3), cores, memory| Benchmark | MPS (ms) | CPU (ms) |
|---|---|---|
| gated_mlp | 0.677 | 1.268 |
| rms_norm | 0.463 | 0.115 |
| lora | 0.637 | 0.590 |
| gqa | 0.554 | - |
| norm_transformer | 1.195 | - |
All benchmarks support CUDA, MPS, and CPU backends
- Quick Start - Get started in 5 minutes
- API Reference - Complete API documentation
- Backend Guide - Backend usage and configuration
- Architecture Design - System design
- Contributing - Contribution guidelines
# MPS backend (Apple Silicon)
python benchmark/baselines/pytorch/gated_mlp.py -b 8 --backend mps
# CUDA backend (NVIDIA GPU)
python benchmark/baselines/pytorch/gated_mlp.py -b 8 --backend cuda
# CPU backend
python benchmark/baselines/pytorch/gated_mlp.py -b 8 --backend cpuimport yirage as yr
# Method 1: Direct specification
mpk = yr.PersistentKernel(backend="mps", ...)
# Method 2: With fallback
mpk = yr.PersistentKernel(
backend="cuda",
fallback_backends=["mps", "cpu"], # Auto fallback
...
)
# Method 3: Query and select
backends = yr.get_available_backends()
best_backend = backends[0] # Use first availableWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Implement
BackendInterface - Create
{Backend}KernelConfig - Implement
{Backend}Optimizer - Create
{Backend}SearchStrategy(optional) - Update CMake configuration
YiRage is licensed under the Apache License 2.0.
Copyright:
- YiRage Multi-Backend Extensions: Copyright 2025 Chen Xingqiang
- Original Mirage: Copyright 2023-2024 Carnegie Mellon University
See LICENSE, NOTICE, and ATTRIBUTION for details.
@software{yirage2025,
title={YiRage: Yield Revolutionary AGile Engine for Multi-Backend LLM Inference},
author={Chen, Xingqiang},
year={2025},
note={A derivative work based on Mirage},
url={https://github.com/chenxingqiang/YiRage}
}
@inproceedings{wu2024mirage,
title={Mirage: A Multi-Level Superoptimizer for Tensor Programs},
author={Mengdi Wu and Xinhao Cheng and Shengyu Liu and others},
booktitle={OSDI 2025},
year={2025}
}YiRage builds upon the excellent work of the Mirage team at Carnegie Mellon University.
- Issues: GitHub Issues
- Author: Chen Xingqiang
- Email: joy6677@outlook.com
YiRage - Yielding Maximum Performance Across All Hardware 🚀
Copyright 2025 Chen Xingqiang | Based on Mirage (CMU) | Apache License 2.0