Skip to content

🚀 RuvLLM v2.3 - RuvLTRA-Medium 3B + Task-Specific LoRA + HuggingFace Hub #118

@ruvnet

Description

@ruvnet

🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon

Crates.io
npm
License

Run Large Language Models locally on your Mac with maximum performance


🎯 What's New in v2.3

🧠 RuvLTRA-Medium 3B Model

Purpose-built 3B model optimized for Claude Flow agent orchestration:

Spec Value
Parameters 3.0B
Hidden Size 2560
Layers 42
Context 256K tokens
Features Flash Attention 2, Speculative Decoding, SONA Hooks

🔌 HuggingFace Hub Integration

Full Hub integration for model distribution:

use ruvllm::hub::{ModelDownloader, ModelUploader, RuvLtraRegistry};

// Download from Hub
let downloader = ModelDownloader::new(DownloadConfig::default());
let path = downloader.download("ruvector/ruvltra-small-q4km", None)?;

// Upload to Hub
let uploader = ModelUploader::new("hf_token");
uploader.upload("./model.gguf", "username/my-model", metadata)?;

🎯 Task-Specific LoRA Adapters

5 pre-trained adapters optimized for Claude Flow agent types:

Adapter Rank Alpha Targets Use Case
Coder 16 32.0 Q,K,V,O Code generation, refactoring
Researcher 8 16.0 Q,K,V Information analysis
Security 16 32.0 Attention + MLP Vulnerability detection
Architect 12 24.0 Q,V + Gate,Up System design
Reviewer 8 16.0 Q,V Code review

🔄 Adapter Merging & Hot-Swap

Advanced adapter composition strategies:

Strategy Description
TIES Trim, Elect, Merge for robust composition
DARE Drop And REscale for sparse merging
SLERP Spherical interpolation for smooth transitions
TaskArithmetic Add/subtract task vectors
// Hot-swap adapters at runtime
let mut manager = HotSwapManager::new();
manager.set_active(coder_adapter);
manager.prepare_standby(security_adapter);
manager.swap()?; // Zero-downtime switch

📊 Claude Dataset Training

2,700+ training examples for Claude Flow optimization:

  • Code generation (900 examples)
  • Research & analysis (450 examples)
  • Security review (450 examples)
  • Architecture design (450 examples)
  • Code review (450 examples)

📈 v2.0-2.2 Features

🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul

Native Core ML integration with Apple's Neural Engine:

Component Technology Benefit
Matrix Multiply Core ML → ANE 261-989x faster vs NEON
Attention Metal GPU Optimized for M4 Pro
Activations ARM NEON SIMD 2.2x faster than ANE
Auto-Dispatch Hybrid Pipeline Best of all worlds

🔄 SONA Self-Learning System

Three-tier learning loops for continuous optimization:

Instant Loop   → <1ms per request (MicroLoRA)
Background Loop → ~10s hourly (BaseLoRA + EWC++)
Deep Loop      → ~10min weekly (Pattern consolidation)

🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow

Spec Value
Base Model Qwen2.5-0.5B-Instruct
Parameters 494M
Hidden Size 896
Layers 24
Context 32K tokens

🏎️ Performance Benchmarks (M4 Pro)

Inference Speed

Model Quant Prefill Decode Memory
RuvLTRA-Small Q4K 3,500 135 491 MB
RuvLTRA-Medium Q4K 2,200 85 1.8 GB
Qwen2.5-7B Q4K 2,800 95 4.2 GB
Llama3-8B Q4K 2,600 88 4.8 GB

Kernel Performance

Kernel Single-thread Multi-thread (10-core)
GEMM 4096×4096 1.2 GFLOPS 12.7 GFLOPS
Flash Attention (2048) 850μs 320μs
HNSW Search (k=10) 24.0μs -
SONA Adapt <1ms -

📦 Installation

Rust

[dependencies]
ruvllm = { version = "2.3", features = ["inference-metal", "coreml", "parallel"] }

npm

npm install @ruvector/ruvllm

🔗 Links


✅ Implementation Status

  • RuvLTRA-Small (0.5B) model
  • RuvLTRA-Medium (3B) model
  • Apple Neural Engine backend
  • SONA self-learning system
  • Flash Attention 2
  • Paged KV Cache
  • Speculative Decoding
  • HuggingFace Hub integration
  • Task-specific LoRA adapters
  • Adapter merging (TIES, DARE, SLERP)
  • Hot-swap adapter management
  • Claude dataset training system
  • HNSW semantic routing (150x faster)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions