Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,20 +28,21 @@

Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own.

| Category | Skill | What It Does |
|----------|-------|--------------|
| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection |
| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find |
| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras |
| **Analysis** | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips |
| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks |
| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 |
| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export |
| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP |
| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view |
| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent |
| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers |
| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out |
| Category | Skill | What It Does | Status |
|----------|-------|--------------|--------|
| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | 🧪 Testing |
| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 Planned |
| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 Planned |
| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ Ready |
| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 Planned |
| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 Planned |
| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 Planned |
| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 Planned |
| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | 📐 Planned |
| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | 📐 Planned |
| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | 📐 Planned |
| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | 📐 Planned |
| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | 📐 Planned |

> **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery.

Expand Down
94 changes: 94 additions & 0 deletions docs/detection-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Detection Skill Protocol

Communication protocol for DeepCamera detection skills integrated with SharpAI Aegis.

## Transport

- **stdin** (Aegis → Skill): frame events and commands
- **stdout** (Skill → Aegis): detection results, ready/error events
- **stderr**: logging only — ignored by Aegis data parser

Format: **JSON Lines** (one JSON object per line, newline-delimited).

## Events

### Ready (Skill → Aegis)

Emitted after model loads successfully. `fps` reflects the skill's configured processing rate. `available_sizes` lists the model variants the skill supports.

```jsonl
{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5, "available_sizes": ["nano", "small", "medium", "large"]}
```

### Frame (Aegis → Skill)

Instruction to analyze a specific frame. `frame_id` is an incrementing integer used to correlate request/response.

```jsonl
{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080}
```

### Detections (Skill → Aegis)

Results of frame analysis. Must echo the same `frame_id` received in the frame event.

```jsonl
{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [
{"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]},
{"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]}
]}
```

### Error (Skill → Aegis)

Indicates a processing error. `retriable: true` means Aegis can send the next frame.

```jsonl
{"event": "error", "frame_id": 42, "message": "Inference error: ...", "retriable": true}
```

### Stop (Aegis → Skill)

Graceful shutdown command.

```jsonl
{"command": "stop"}
```

## Data Formats

### Bounding Boxes

**Format**: `[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy).

| Field | Type | Description |
|-------|------|-------------|
| `x_min` | int | Left edge (pixels) |
| `y_min` | int | Top edge (pixels) |
| `x_max` | int | Right edge (pixels) |
| `y_max` | int | Bottom edge (pixels) |

Coordinates are in the original image space (not normalized).

### Timestamps

ISO 8601 format: `2026-03-01T14:30:00Z`

### Frame Transfer

Frames are written to `/tmp/aegis_detection/frame_{camera_id}.jpg` as JPEG files with recycled per-camera filenames (overwritten each cycle). The `frame_path` in the frame event is the absolute path to the JPEG file.

## FPS Presets

| Preset | FPS | Use Case |
|--------|-----|----------|
| Ultra Low | 0.2 | Battery saver |
| Low | 0.5 | Passive surveillance |
| Normal | 1 | Standard monitoring |
| Active | 3 | Active area monitoring |
| High | 5 | Security-critical zones |
| Real-time | 15 | Live tracking |

## Backpressure

The protocol is **request-response**: Aegis sends one frame, waits for the detection result, then sends the next. This provides natural backpressure — if the skill is slow, Aegis automatically drops frames (always uses the latest available frame).
2 changes: 1 addition & 1 deletion docs/legacy-applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

## Application 1: Self-supervised Person Recognition (REID) for Intruder Detection

SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector_cpu.py)
SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector.py)

It leverages Yolov7 as person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as labeling data and training your own classifier. It also integrates with Home-Assistant to empower smart home with AI technology.

Expand Down
102 changes: 101 additions & 1 deletion docs/skill-development.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,13 @@ A skill is a self-contained folder that provides an AI capability to [SharpAI Ae
```
skills/<category>/<skill-name>/
├── SKILL.md # Manifest + setup instructions
├── requirements.txt # Python dependencies
├── config.yaml # Configuration schema for Aegis UI
├── deploy.sh # Zero-assumption installer
├── requirements.txt # Default Python dependencies
├── requirements_cuda.txt # NVIDIA GPU dependencies
├── requirements_rocm.txt # AMD GPU dependencies
├── requirements_mps.txt # Apple Silicon dependencies
├── requirements_cpu.txt # CPU-only dependencies
├── scripts/
│ └── main.py # Entry point
├── assets/
Expand Down Expand Up @@ -68,6 +74,70 @@ LLM agent can read and execute.
| `url` | URL input with validation | Server address |
| `camera_select` | Camera picker | Target cameras |

## config.yaml — Configuration Schema

Defines user-configurable options shown in the Aegis Skills UI. Parsed by `parseConfigYaml()`.

```yaml
params:
- key: auto_start
label: Auto Start
type: boolean
default: false
description: "Start automatically on Aegis launch"

- key: model_size
label: Model Size
type: select
default: nano
description: "Choose model variant"
options:
- { value: nano, label: "Nano (fastest)" }
- { value: small, label: "Small (balanced)" }

- key: confidence
label: Confidence
type: number
default: 0.5
description: "Min confidence (0.1–1.0)"
```

### Reserved Keys

| Key | Type | Behavior |
|-----|------|----------|
| `auto_start` | boolean | Aegis auto-starts the skill on boot when `true` |

## deploy.sh — Zero-Assumption Installer

Bootstraps the environment from scratch. Must handle:

1. **Find Python** — check system → conda → pyenv
2. **Create venv** — isolated `.venv/` inside skill directory
3. **Detect GPU** — CUDA → ROCm → MPS → CPU fallback
4. **Install deps** — from matching `requirements_<backend>.txt`
5. **Verify** — import test

Emit JSONL progress for Aegis UI:
```bash
echo '{"event": "progress", "stage": "gpu", "backend": "mps"}'
echo '{"event": "complete", "backend": "mps", "message": "Installed!"}'
```

## Environment Variables

Aegis injects these into every skill process:

| Variable | Description |
|----------|-------------|
| `AEGIS_SKILL_ID` | Skill identifier |
| `AEGIS_SKILL_PARAMS` | JSON string of user config values |
| `AEGIS_GATEWAY_URL` | LLM gateway URL |
| `AEGIS_VLM_URL` | VLM server URL |
| `AEGIS_LLM_MODEL` | Active LLM model name |
| `AEGIS_VLM_MODEL` | Active VLM model name |
| `PYTHONUNBUFFERED` | Set to `1` for real-time output |

## JSON Lines Protocol

Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
Expand Down Expand Up @@ -108,6 +178,36 @@ Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
echo '{"event": "frame", "camera_id": "test", "frame_path": "/tmp/test.jpg"}' | python scripts/main.py
```

## skills.json — Catalog Registration

Register skills in the repo root `skills.json`:

```json
{
"skills": [
{
"id": "my-skill",
"name": "My Skill",
"description": "What it does",
"category": "detection",
"tags": ["tag1"],
"path": "skills/detection/my-skill",
"status": "testing",
"platforms": ["darwin-arm64", "linux-x64"]
}
]
}
```

### Status Values

| Status | Emoji | Meaning |
|--------|-------|---------|
| `ready` | ✅ | Production-quality, tested |
| `testing` | 🧪 | Functional, needs validation |
| `experimental` | ⚗️ | Proof of concept |
| `planned` | 📐 | Not yet implemented |

## Reference

See [`skills/detection/yolo-detection-2026/`](../skills/detection/yolo-detection-2026/) for a complete working example.
48 changes: 48 additions & 0 deletions skills.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,54 @@
"ui_unlocks": [
"benchmark_report"
]
},
{
"id": "yolo-detection-2026",
"name": "YOLO 2026 Object Detection",
"description": "State-of-the-art real-time object detection — 80+ COCO classes, bounding box overlays, multi-size model selection.",
"version": "1.0.0",
"category": "detection",
"path": "skills/detection/yolo-detection-2026",
"tags": [
"detection",
"yolo",
"object-detection",
"real-time",
"coco"
],
"platforms": [
"linux-x64",
"linux-arm64",
"darwin-arm64",
"darwin-x64",
"win-x64"
],
"requirements": {
"python": ">=3.9",
"ram_gb": 2
},
"capabilities": [
"live_detection",
"bbox_overlay"
],
"ui_unlocks": [
"detection_overlay",
"detection_results"
],
"fps_presets": [
0.2,
0.5,
1,
3,
5,
15
],
"model_sizes": [
"nano",
"small",
"medium",
"large"
]
}
]
}
13 changes: 10 additions & 3 deletions skills/analysis/home-security-benchmark/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ version: 2.0.0
category: analysis
runtime: node
entry: scripts/run-benchmark.cjs
install: none
install: npm
---

# Home Security AI Benchmark
Expand All @@ -14,7 +14,7 @@ Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** acr

## Setup

**No installation required.** This skill has zero external dependencies — it uses only Node.js built-in modules. No `npm install` needed.
**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching.

Entry script: `scripts/run-benchmark.cjs`

Expand Down Expand Up @@ -53,7 +53,13 @@ node scripts/run-benchmark.cjs --no-open
| Variable | Default | Description |
|----------|---------|-------------|
| `AEGIS_GATEWAY_URL` | `http://localhost:5407` | LLM gateway (OpenAI-compatible) |
| `AEGIS_LLM_URL` | — | Direct llama-server LLM endpoint |
| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (builtin, openai, etc.) |
| `AEGIS_LLM_MODEL` | — | LLM model name |
| `AEGIS_LLM_API_KEY` | — | API key for cloud LLM providers |
| `AEGIS_LLM_BASE_URL` | — | Cloud provider base URL (e.g. `https://api.openai.com/v1`) |
| `AEGIS_VLM_URL` | *(disabled)* | VLM server base URL |
| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID |
| `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) |
| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config |

Expand Down Expand Up @@ -129,5 +135,6 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros
## Requirements

- Node.js ≥ 18
- Running LLM server (llama-cpp, vLLM, or any OpenAI-compatible API)
- `npm install` (for `openai` SDK dependency)
- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint)
- Optional: Running VLM server for scene analysis tests (35 tests)
37 changes: 37 additions & 0 deletions skills/analysis/home-security-benchmark/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading