-
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
Overview
Implement an intelligent Skills Router system that efficiently matches user queries to relevant skills using a multi-stage approach combining fast heuristics with optional LLM reranking.
Note
This is a cross-repository Epic. Core routing logic lives in Neona, cloud services live in Neona-Cloud.
Problem Statement
Currently, Neona's skill system (Phase 4) has no efficient way to match user queries to relevant skills. A naive LLM-for-every-query approach would be slow and expensive. We need a hybrid approach that balances speed, accuracy, and cost.
Goals
- 🚀 Fast heuristic-based matching for most queries (no LLM needed)
- 🧠 LLM reranking only when needed (ambiguous or low-confidence results)
- 💾 Intelligent caching (L1/L2 local, L3 distributed)
- 🛡️ Guardrails for safety, rate limiting, and policy enforcement
- ⚡ Priority-based retrieval for efficient skill discovery
Architecture
User Query
│
▼
[Hybrid Pre-Router] → Exact match, keywords, capabilities, heuristics
│
▼
[Priority Retrieval] → Cache check (L1→L2→L3), filtering, confidence check
│
├─ High confidence ──→ [Return Results]
│
└─ Low confidence ──→ [LLM Reranking (Cloud)] → [Guardrails] → [Cache] → [Return Results]
Child Issues
Neona (Core)
- [Skills Router] Implement Hybrid Pre-Router with Heuristic Scoring #83 - Hybrid Pre-Router Implementation
- [Skills Router] Implement Priority Retrieval with Caching #84 - Priority Retrieval System
- [Skills Router] Local Cache + Guardrails #86 - Local Cache + Guardrails
Neona-Cloud (Backend)
- Neona-AI/Neona-Cloud#17 - Epic: Skills Router Cloud Services
- Neona-AI/Neona-Cloud#18 - Cloud Skills API
- Neona-AI/Neona-Cloud#19 - LLM Reranking Service
- Neona-AI/Neona-Cloud#20 - Distributed Cache (L3)
- Neona-AI/Neona-Cloud#21 - Cloud Guardrails
Performance Targets
| Metric | Target |
|---|---|
| Cache hit latency (L1) | < 5ms |
| Heuristic-only latency | 10-50ms |
| Heuristic + LLM rerank | 200-1000ms |
| Cache hit rate | > 70% |
| LLM rerank rate | < 30% of queries |
Dependencies
- Phase 4.1: Skill Definition System
- Phase 4.2: Skill Registry
Timeline
Target Completion: 5 weeks
Success Criteria
- Router matches skills using heuristics (no LLM)
- LLM reranking triggers only when needed (via Cloud)
- Multi-level caching reduces redundant computations
- Guardrails prevent inappropriate skill usage
- Performance meets targets
- Test coverage > 80%
Reactions are currently unavailable