Hardware-accelerated inference on IP cameras using the MXUv3 SIMD unit and ORAM.
| Metric | Before (Scalar) | After (Mars) | Speedup |
|---|---|---|---|
| Inference Time | 35 seconds | 1.75 seconds | 20x |
| Memory Read | 41 MB/s (DDR) | 314 MB/s (ORAM) | 7.6x |
| Memory Write | 77 MB/s (DDR) | 1578 MB/s (ORAM) | 20.6x |
Mars is an open-source neural network runtime for the Ingenic T41 SoC, reverse-engineered from Ingenic's proprietary Venus SDK. It enables hardware-accelerated inference on IP cameras running Thingino firmware.
Why Mars?
- 🔓 Open Source: No proprietary SDKs or closed toolchains
- 🐧 musl Compatible: Works with Thingino's musl libc (Venus requires glibc)
- ⚡ Hardware Accelerated: Uses MXUv3 SIMD (512-bit) and on-chip ORAM
- 🎯 Purpose-Built: Custom TinyDet model for security camera use cases
- ✅ MXUv3 SIMD acceleration (16 floats per instruction)
- ✅ ORAM weight staging (640KB on-chip, 7.6x faster than DDR)
- ✅ Conv2D, ReLU, MaxPool, Add, Concat operations
- ✅ NHWC tensor format (optimized for T41 memory access)
- ✅ Custom
.marsmodel format - ✅ ONNX → Mars compiler (Python + Rust)
# Set cross-compiler (adjust path to your toolchain)
export CROSS_COMPILE=/path/to/mipsel-linux-
# Build runtime library and tools
make
# Output:
# build/lib/libmars.so - Runtime library
# build/bin/mars_detect - Detection CLI tool# Copy to camera
scp build/bin/mars_detect build/lib/libmars.so root@camera:/opt/
# Run detection
ssh root@camera
cd /opt
LD_LIBRARY_PATH=/opt ./mars_detect model.mars input.jpg output.jpgcd mars-compiler
# Stage 1: ONNX → JSON + weights
python3 onnx2mars.py model.onnx -o model
# Stage 2: JSON → .mars binary
cargo run -- -i model.json -o model.mars --float32thingino-accel/
├── src/mars/ # Mars runtime (C)
│ ├── mars_runtime.c # Model loader and executor
│ ├── mxu_conv.c # MXUv3 convolution kernels
│ └── mars_nn_hw.c # Hardware initialization (ORAM, MXU)
├── mars-compiler/ # ONNX → Mars compiler
│ ├── onnx2mars.py # Python: ONNX → JSON extraction
│ └── src/ # Rust: JSON → .mars binary
├── training/ # TinyDet model training
│ ├── tinydet.py # Model architecture
│ └── train_*.py # Training scripts
├── include/ # Public headers
└── docs/ # Documentation
└── MARS_PROJECT_WRITEUP.md # Full research paper
We trained a purpose-built 4-class detector optimized for security cameras:
| Class | Description |
|---|---|
| Person | Human detection |
| Vehicle | Cars, trucks |
| Cat | Feline pets |
| Dog | Canine pets |
Model specs:
- Input: 320×192 RGB (NHWC)
- Parameters: ~202K
- Architecture: Anchor-free, single-stage
- Training: COCO + Oxford Pets datasets
- CPU: Dual XBurst2 @ 1.5GHz (MIPS)
- MXUv3: 512-bit SIMD, 32 VPR registers
- ORAM: 640KB @ 0x12640000 (on-chip SRAM)
- NNA: Neural Network Accelerator with NNDMA
| Region | Bandwidth | Latency |
|---|---|---|
| DDR | 41 MB/s read | High |
| ORAM | 314 MB/s read | Low |
Weights are staged to ORAM before convolution for maximum throughput.
- 📄 Full Project Writeup - Research paper covering reverse engineering, MXUv3 discovery, and model design
- 📘 Mars Runtime README - Runtime architecture and API
- 🎓 Training README - Model training guide
| Feature | Venus (OEM) | Mars |
|---|---|---|
| License | Proprietary | GPLv3 |
| C Library | glibc only | musl/glibc |
| Model Format | .mgk (closed) |
.mars (open) |
| Source Code | No | Yes |
| Compiler | Closed | Python + Rust |
Contributions welcome! See the project writeup for technical background.
GPLv3 - See LICENSE for details.
- Thingino - Open-source IP camera firmware
- OpenSensor - Project home
