Skip to content

🧭 Epic: Skills Router (Cross-Repo) #82

@fentz26

Description

@fentz26

Overview

Implement an intelligent Skills Router system that efficiently matches user queries to relevant skills using a multi-stage approach combining fast heuristics with optional LLM reranking.

Note

This is a cross-repository Epic. Core routing logic lives in Neona, cloud services live in Neona-Cloud.

Problem Statement

Currently, Neona's skill system (Phase 4) has no efficient way to match user queries to relevant skills. A naive LLM-for-every-query approach would be slow and expensive. We need a hybrid approach that balances speed, accuracy, and cost.

Goals

  1. 🚀 Fast heuristic-based matching for most queries (no LLM needed)
  2. 🧠 LLM reranking only when needed (ambiguous or low-confidence results)
  3. 💾 Intelligent caching (L1/L2 local, L3 distributed)
  4. 🛡️ Guardrails for safety, rate limiting, and policy enforcement
  5. Priority-based retrieval for efficient skill discovery

Architecture

User Query
    │
    ▼
[Hybrid Pre-Router] → Exact match, keywords, capabilities, heuristics
    │
    ▼
[Priority Retrieval] → Cache check (L1→L2→L3), filtering, confidence check
    │
    ├─ High confidence ──→ [Return Results]
    │
    └─ Low confidence ──→ [LLM Reranking (Cloud)] → [Guardrails] → [Cache] → [Return Results]

Child Issues

Neona (Core)

Neona-Cloud (Backend)

  • Neona-AI/Neona-Cloud#17 - Epic: Skills Router Cloud Services
  • Neona-AI/Neona-Cloud#18 - Cloud Skills API
  • Neona-AI/Neona-Cloud#19 - LLM Reranking Service
  • Neona-AI/Neona-Cloud#20 - Distributed Cache (L3)
  • Neona-AI/Neona-Cloud#21 - Cloud Guardrails

Performance Targets

Metric Target
Cache hit latency (L1) < 5ms
Heuristic-only latency 10-50ms
Heuristic + LLM rerank 200-1000ms
Cache hit rate > 70%
LLM rerank rate < 30% of queries

Dependencies

  • Phase 4.1: Skill Definition System
  • Phase 4.2: Skill Registry

Timeline

Target Completion: 5 weeks

Success Criteria

  • Router matches skills using heuristics (no LLM)
  • LLM reranking triggers only when needed (via Cloud)
  • Multi-level caching reduces redundant computations
  • Guardrails prevent inappropriate skill usage
  • Performance meets targets
  • Test coverage > 80%

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic grouping issuepriority:highAddress within 30 days

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions