-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Labels
enhancementNew feature or requestNew feature or request
Description
🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon
Run Large Language Models locally on your Mac with maximum performance
🎯 What's New in v2.3
🧠 RuvLTRA-Medium 3B Model
Purpose-built 3B model optimized for Claude Flow agent orchestration:
| Spec | Value |
|---|---|
| Parameters | 3.0B |
| Hidden Size | 2560 |
| Layers | 42 |
| Context | 256K tokens |
| Features | Flash Attention 2, Speculative Decoding, SONA Hooks |
🔌 HuggingFace Hub Integration
Full Hub integration for model distribution:
use ruvllm::hub::{ModelDownloader, ModelUploader, RuvLtraRegistry};
// Download from Hub
let downloader = ModelDownloader::new(DownloadConfig::default());
let path = downloader.download("ruvector/ruvltra-small-q4km", None)?;
// Upload to Hub
let uploader = ModelUploader::new("hf_token");
uploader.upload("./model.gguf", "username/my-model", metadata)?;🎯 Task-Specific LoRA Adapters
5 pre-trained adapters optimized for Claude Flow agent types:
| Adapter | Rank | Alpha | Targets | Use Case |
|---|---|---|---|---|
| Coder | 16 | 32.0 | Q,K,V,O | Code generation, refactoring |
| Researcher | 8 | 16.0 | Q,K,V | Information analysis |
| Security | 16 | 32.0 | Attention + MLP | Vulnerability detection |
| Architect | 12 | 24.0 | Q,V + Gate,Up | System design |
| Reviewer | 8 | 16.0 | Q,V | Code review |
🔄 Adapter Merging & Hot-Swap
Advanced adapter composition strategies:
| Strategy | Description |
|---|---|
| TIES | Trim, Elect, Merge for robust composition |
| DARE | Drop And REscale for sparse merging |
| SLERP | Spherical interpolation for smooth transitions |
| TaskArithmetic | Add/subtract task vectors |
// Hot-swap adapters at runtime
let mut manager = HotSwapManager::new();
manager.set_active(coder_adapter);
manager.prepare_standby(security_adapter);
manager.swap()?; // Zero-downtime switch📊 Claude Dataset Training
2,700+ training examples for Claude Flow optimization:
- Code generation (900 examples)
- Research & analysis (450 examples)
- Security review (450 examples)
- Architecture design (450 examples)
- Code review (450 examples)
📈 v2.0-2.2 Features
🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul
Native Core ML integration with Apple's Neural Engine:
| Component | Technology | Benefit |
|---|---|---|
| Matrix Multiply | Core ML → ANE | 261-989x faster vs NEON |
| Attention | Metal GPU | Optimized for M4 Pro |
| Activations | ARM NEON SIMD | 2.2x faster than ANE |
| Auto-Dispatch | Hybrid Pipeline | Best of all worlds |
🔄 SONA Self-Learning System
Three-tier learning loops for continuous optimization:
Instant Loop → <1ms per request (MicroLoRA)
Background Loop → ~10s hourly (BaseLoRA + EWC++)
Deep Loop → ~10min weekly (Pattern consolidation)
🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow
| Spec | Value |
|---|---|
| Base Model | Qwen2.5-0.5B-Instruct |
| Parameters | 494M |
| Hidden Size | 896 |
| Layers | 24 |
| Context | 32K tokens |
🏎️ Performance Benchmarks (M4 Pro)
Inference Speed
| Model | Quant | Prefill | Decode | Memory |
|---|---|---|---|---|
| RuvLTRA-Small | Q4K | 3,500 | 135 | 491 MB |
| RuvLTRA-Medium | Q4K | 2,200 | 85 | 1.8 GB |
| Qwen2.5-7B | Q4K | 2,800 | 95 | 4.2 GB |
| Llama3-8B | Q4K | 2,600 | 88 | 4.8 GB |
Kernel Performance
| Kernel | Single-thread | Multi-thread (10-core) |
|---|---|---|
| GEMM 4096×4096 | 1.2 GFLOPS | 12.7 GFLOPS |
| Flash Attention (2048) | 850μs | 320μs |
| HNSW Search (k=10) | 24.0μs | - |
| SONA Adapt | <1ms | - |
📦 Installation
Rust
[dependencies]
ruvllm = { version = "2.3", features = ["inference-metal", "coreml", "parallel"] }npm
npm install @ruvector/ruvllm🔗 Links
✅ Implementation Status
- RuvLTRA-Small (0.5B) model
- RuvLTRA-Medium (3B) model
- Apple Neural Engine backend
- SONA self-learning system
- Flash Attention 2
- Paged KV Cache
- Speculative Decoding
- HuggingFace Hub integration
- Task-specific LoRA adapters
- Adapter merging (TIES, DARE, SLERP)
- Hot-swap adapter management
- Claude dataset training system
- HNSW semantic routing (150x faster)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request