Tabletop Perception for Beginner Robot Kits

Live demo — point a webcam at a desk, watch a specialist and a generalist reason side-by-side. In-browser, no server, no API keys.

A fine-tuned MobileNetV3-small (2.5M frozen + 6K head) hits 97% top-1 at ~15 ms/frame on a 6-class tabletop task (cell_phone, cup, headphone, laptop, scissors, stapler). A 450M-parameter open-vocab VLM (LFM2.5-VL-450M) on the same input runs ~85× slower at detect (1.3 s/query vs 15 ms/frame) and collapses to 0% recall@IoU≥0.3 on stylized out-of-distribution synthetic scenes — though on natural photos it correctly refuses absent-object queries, a property the specialist cannot offer. Production shape: both running at once — specialist on the camera stream, generalist on typed queries — with the open-vocab tier extending coverage where the closed-set head cannot reach.

Choosing between them is a deployment decision, not a benchmark one.

Results

Tier	Model	Params	Latency	Top-1
Naive	HSV threshold	0	~1 ms	24%
Classical	Color-hist + HOG + GBM	~2.4K trees	~40 ms	76%
DL	MobileNetV3-small fine-tune	2.5M frozen + 6K head	~15 ms	97%
VLM	LFM2.5-VL-450M zero-shot	450M	~1300 ms	open-vocab

Per-class F1, latency breakdown, and a five-case error analysis: report/report.md.

Reproduce

git clone https://github.com/jonasneves/aipi540-tabletop-perception
cd aipi540-tabletop-perception
pip install -r requirements.txt
make dataset   # downloads + stages Caltech-101, ~2 min
make eval      # runs all three models + exports ONNX
make serve     # local demo on :8088

Structure

.
├── README.md
├── SCOPE.md
├── requirements.txt
├── Makefile               # dataset | eval | sync | serve | deploy
├── scripts/
│   ├── make_dataset.py    # Caltech-101 download + 6-class filter
│   ├── naive.py           # HSV dominant-hue baseline
│   ├── classical.py       # color-hist + HOG + GBM
│   └── train_dl.py        # MobileNetV3-small fine-tune + ONNX export
├── models/                # ONNX + pickle artifacts
├── data/
│   ├── raw/
│   └── processed/
├── results/               # scores.json, plots
├── report/                # written report + figures
├── public/                # static site: ONNX Runtime Web + WebGPU
└── docs -> public         # GH Pages serves main/docs → public

Team

Jonas Neves · Duke University · AIPI 540 · Spring 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabletop Perception for Beginner Robot Kits

Results

Reproduce

Structure

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data/synthetic_scenes		data/synthetic_scenes
models		models
public		public
report		report
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SCOPE.md		SCOPE.md
app.py		app.py
docs		docs
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Tabletop Perception for Beginner Robot Kits

Results

Reproduce

Structure

Team

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages