SharpAI · solderzzc · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/README.md b/README.md
@@ -24,13 +24,22 @@
 
 ---
 
+## 🗺️ Roadmap
+
+- [x] **Skill architecture** — pluggable `SKILL.md` interface for all capabilities
+- [x] **Skill Store UI** — browse, install, and configure skills from Aegis
+- [x] **AI/LLM-assisted skill installation** — community-contributed skills installed and configured via AI agent
+- [x] **GPU / NPU / CPU (AIPC) aware installation** — auto-detect hardware, install matching frameworks, convert models to optimal format
+- [x] **Hardware environment layer** — shared [`env_config.py`](skills/lib/env_config.py) for auto-detection + model optimization across NVIDIA, AMD, Apple Silicon, Intel, and CPU
+- [ ] **Skill development** — 18 skills across 9 categories, actively expanding with community contributions
+
 ## 🧩 Skill Catalog
 
 Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own.
 
 | Category | Skill | What It Does | Status |
 |----------|-------|--------------|:------:|
-| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | ✅|
+| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class detection — auto-accelerated via TensorRT / CoreML / OpenVINO / ONNX | ✅|
 | | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 |
 | | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 |
 | **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ |
@@ -48,13 +57,6 @@ Each skill is a self-contained module with its own model, parameters, and [commu
 
 > **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery.
 
-### 🗺️ Roadmap
-
-- [x] **Skill architecture** — pluggable `SKILL.md` interface for all capabilities
-- [x] **Full skill catalog** — 18 skills across 9 categories with working scripts
-- [ ] **Skill Store UI** — browse, install, and configure skills from Aegis
-- [ ] **Custom skill packaging** — community-contributed skills via GitHub
-- [ ] **GPU-optimized containers** — one-click Docker deployment per skill
 
 ## 🚀 Getting Started with [SharpAI Aegis](https://www.sharpai.org)
 
@@ -89,6 +91,53 @@ The easiest way to run DeepCamera's AI skills. Aegis connects everything — cam
 </table>
 
 
+## 🎯 YOLO 2026 — Real-Time Object Detection
+
+State-of-the-art detection running locally on **any hardware**, fully integrated as a [DeepCamera skill](skills/detection/yolo-detection-2026/).
+
+### YOLO26 Models
+
+YOLO26 (Jan 2026) eliminates NMS and DFL for cleaner exports and lower latency. Pick the size that fits your hardware:
+
+| Model | Params | Latency (optimized) | Use Case |
+|-------|--------|:-------------------:|----------|
+| **yolo26n** (nano) | 2.6M | ~2ms | Edge devices, real-time on CPU |
+| **yolo26s** (small) | 11.2M | ~5ms | Balanced speed & accuracy |
+| **yolo26m** (medium) | 25.4M | ~12ms | Accuracy-focused |
+| **yolo26l** (large) | 52.3M | ~25ms | Maximum detection quality |
+
+All models detect **80+ COCO classes**: people, vehicles, animals, everyday objects.
+
+### Hardware Acceleration
+
+The shared [`env_config.py`](skills/lib/env_config.py) **auto-detects your GPU** and converts the model to the fastest native format — zero manual setup:
+
+| Your Hardware | Optimized Format | Runtime | Speedup vs PyTorch |
+|---------------|-----------------|---------|:------------------:|
+| **NVIDIA GPU** (RTX, Jetson) | TensorRT `.engine` | CUDA | **3-5x** |
+| **Apple Silicon** (M1–M4) | CoreML `.mlpackage` | ANE + GPU | **~2x** |
+| **Intel** (CPU, iGPU, NPU) | OpenVINO IR `.xml` | OpenVINO | **2-3x** |
+| **AMD GPU** (RX, MI) | ONNX Runtime | ROCm | **1.5-2x** |
+| **Any CPU** | ONNX Runtime | CPU | **~1.5x** |
+
+### Aegis Skill Integration
+
+Detection runs as a **parallel pipeline** alongside VLM analysis — never blocks your AI agent:
+
+```
+Camera → Frame Governor → detect.py (JSONL) → Aegis IPC → Live Overlay
+                5 FPS           ↓
+                          perf_stats (p50/p95/p99 latency)
+```
+
+- 🖱️ **Click to setup** — one button in Aegis installs everything, no terminal needed
+- 🤖 **AI-driven environment config** — autonomous agent detects your GPU, installs the right framework (CUDA/ROCm/CoreML/OpenVINO), converts models, and verifies the setup
+- 📺 **Live bounding boxes** — detection results rendered as overlays on RTSP camera streams
+- 📊 **Built-in performance profiling** — aggregate latency stats (p50/p95/p99) emitted every 50 frames
+- ⚡ **Auto start** — set `auto_start: true` to begin detecting when Aegis launches
+
+📖 [Full Skill Documentation →](skills/detection/yolo-detection-2026/SKILL.md)
+
 ## 📊 HomeSec-Bench — How Secure Is Your Local AI?
 
 **HomeSec-Bench** is a 131-test security benchmark that measures how well your local AI performs as a security guard. It tests what matters: Can it detect a person in fog? Classify a break-in vs. a delivery? Resist prompt injection? Route alerts correctly at 3 AM?

diff --git a/skills/detection/yolo-detection-2026/SKILL.md b/skills/detection/yolo-detection-2026/SKILL.md
@@ -1,11 +1,19 @@
 ---
 name: yolo-detection-2026
 description: "YOLO 2026 — state-of-the-art real-time object detection"
-version: 1.0.0
+version: 2.0.0
 icon: assets/icon.png
 entry: scripts/detect.py
+deploy: deploy.sh
 
 parameters:
+  - name: auto_start
+    label: "Auto Start"
+    type: boolean
+    default: false
+    description: "Start this skill automatically when Aegis launches"
+    group: Lifecycle
+
   - name: model_size
     label: "Model Size"
     type: select
@@ -45,6 +53,13 @@ parameters:
     description: "auto = best available GPU, else CPU"
     group: Performance
 
+  - name: use_optimized
+    label: "Hardware Acceleration"
+    type: boolean
+    default: true
+    description: "Auto-convert model to optimized format for faster inference"
+    group: Performance
+
 capabilities:
   live_detection:
     script: scripts/detect.py
@@ -64,6 +79,50 @@ Real-time object detection using the latest YOLO 2026 models. Detects 80+ COCO o
 | medium | Moderate | High | Accuracy-focused deployments |
 | large | Slower | Highest | Maximum detection quality |
 
+## Hardware Acceleration
+
+The skill uses [`env_config.py`](../../lib/env_config.py) to **automatically detect hardware** and convert the model to the fastest format for your platform. Conversion happens once during deployment and is cached.
+
+| Platform | Backend | Optimized Format | Expected Speedup |
+|----------|---------|------------------|:----------------:|
+| NVIDIA GPU | CUDA | TensorRT `.engine` | ~3-5x |
+| Apple Silicon (M1+) | MPS | CoreML `.mlpackage` | ~2x |
+| Intel CPU/GPU/NPU | OpenVINO | OpenVINO IR `.xml` | ~2-3x |
+| AMD GPU | ROCm | ONNX Runtime | ~1.5-2x |
+| CPU (any) | CPU | ONNX Runtime | ~1.5x |
+
+### How It Works
+
+1. `deploy.sh` detects your hardware via `env_config.HardwareEnv.detect()`
+2. Installs the matching `requirements_{backend}.txt` (e.g. CUDA → includes `tensorrt`)
+3. Pre-converts the default model to the optimal format
+4. At runtime, `detect.py` loads the cached optimized model automatically
+5. Falls back to PyTorch if optimization fails
+
+Set `use_optimized: false` to disable auto-conversion and use raw PyTorch.
+
+## Auto Start
+
+Set `auto_start: true` in the skill config to start detection automatically when Aegis launches. The skill will begin processing frames from the selected camera immediately.
+
+```yaml
+auto_start: true
+model_size: nano
+fps: 5
+```
+
+## Performance Monitoring
+
+The skill emits `perf_stats` events every 50 frames with aggregate timing:
+
+```jsonl
+{"event": "perf_stats", "total_frames": 50, "timings_ms": {
+  "inference": {"avg": 3.4, "p50": 3.2, "p95": 5.1},
+  "postprocess": {"avg": 0.15, "p50": 0.12, "p95": 0.31},
+  "total": {"avg": 3.6, "p50": 3.4, "p95": 5.5}
+}}
+```
+
 ## Protocol
 
 Communicates via **JSON lines** over stdin/stdout.
@@ -75,10 +134,11 @@ Communicates via **JSON lines** over stdin/stdout.
 
 ### Skill → Aegis (stdout)
 ```jsonl
-{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5}
+{"event": "ready", "model": "yolo2026n", "device": "mps", "backend": "mps", "format": "coreml", "gpu": "Apple M3", "classes": 80, "fps": 5}
 {"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "objects": [
   {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}
 ]}
+{"event": "perf_stats", "total_frames": 50, "timings_ms": {"inference": {"avg": 3.4}}}
 {"event": "error", "message": "...", "retriable": true}
 ```
 
@@ -90,20 +150,20 @@ Communicates via **JSON lines** over stdin/stdout.
 {"command": "stop"}
 ```
 
-## Hardware Support
-
-| Platform | Backend | Performance |
-|----------|---------|-------------|
-| Apple Silicon (M1+) | MPS | 20-30 FPS |
-| NVIDIA GPU | CUDA | 25-60 FPS |
-| AMD GPU | ROCm | 15-40 FPS |
-| CPU (modern x86) | CPU | 5-15 FPS |
-| Raspberry Pi 5 | CPU | 2-5 FPS |
-
 ## Installation
 
-The `deploy.sh` bootstrapper handles everything — Python environment, GPU backend detection, and dependency installation. No manual setup required.
+The `deploy.sh` bootstrapper handles everything — Python environment, GPU backend detection, dependency installation, and model optimization. No manual setup required.
 
 ```bash
 ./deploy.sh
 ```
+
+### Requirements Files
+
+| File | Backend | Key Deps |
+|------|---------|----------|
+| `requirements_cuda.txt` | NVIDIA | `torch` (cu124), `tensorrt` |
+| `requirements_mps.txt` | Apple | `torch`, `coremltools` |
+| `requirements_intel.txt` | Intel | `torch`, `openvino` |
+| `requirements_rocm.txt` | AMD | `torch` (rocm6.2), `onnxruntime-rocm` |
+| `requirements_cpu.txt` | CPU | `torch` (cpu), `onnxruntime` |
diff --git a/skills/detection/yolo-detection-2026/config.yaml b/skills/detection/yolo-detection-2026/config.yaml
@@ -56,3 +56,11 @@ params:
       - { value: cuda, label: "NVIDIA CUDA" }
       - { value: mps, label: "Apple Silicon (MPS)" }
       - { value: rocm, label: "AMD ROCm" }
+
+  - key: use_optimized
+    label: Hardware Acceleration
+    type: boolean
+    default: true
+    description: "Auto-convert model to optimized format (TensorRT/CoreML/OpenVINO/ONNX) for faster inference"
+
+