27 methods × 8 windows variant-first benchmark (248 ready problems) by rsasaki0109 · Pull Request #1 · rsasaki0109/localization_zoo

rsasaki0109 · 2026-04-09T22:10:28Z

Summary

27 LiDAR localization methods compared across 4 public dataset families with variant-first benchmarking.

248 ready + 1 blocked + 1 skipped benchmark problems
38 from-scratch paper implementations, 27 integrated into unified CLI
4 dataset families: Istanbul, HDL-400, MCD (Multi-Campus), KITTI Raw
GT-seeded vs pure odometry ablation (--no-gt-seed)
Full sequence evaluation (up to 703 frames)
Docker reproducible build verified
38/38 unit tests pass

Key Results

Pareto Front: Accuracy vs Throughput (248 defaults)

Default Variant Instability Across Datasets

No method has a universal "best" variant — the optimal profile changes depending on dataset characteristics.

Variant Fronts by Method Family

GT-Seed Ablation Finding

Method	GT-seeded ATE	No-GT ATE	Comment
LiTAMIN2	1.05 m	122.28 m	Diverges without GT
GICP	1.46 m	2.21 m	Robust
NDT	0.28 m	122.66 m	Diverges without GT
KISS-ICP	2.41 m	2.41 m	Pure odometry (unaffected)
CT-ICP	1.66 m	1.66 m	Pure odometry (unaffected)

What's Included

evaluation/src/pcd_dogfooding.cpp — unified benchmark CLI (27 methods)
experiments/*_matrix.json — 250 active + 80 pending manifests
experiments/results/ — all aggregate results
docs/variant_analysis.md — cross-dataset stability + profile impact analysis
docs/implementation_notes.md — 27 methods quality audit
docs/native_time_provenance.md — HDL-400 time provenance documentation
Dockerfile — reproducible build (Ceres 2.1/2.2 compatible)

- add manifest-level benchmark weighting support\n- expand profile coverage across multiple method manifests\n- refresh aggregates, docs, and paper assets\n- update HDL Graph SLAM short-sequence variants\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- add CLINS fast/dense evaluator profiles\n- add a public ROS1 HDL-400 synth-time CLINS manifest\n- refresh aggregates, docs, paper assets, and handoff notes\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- update README benchmark scope and selector references\n- clarify public ROS1 HDL-400 synth-time versus reference/native-time windows\n- refresh paper outline, claims, and table checklist counts\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- mark the full-sequence HDL Graph SLAM policy as resolved\n- record 0009_full as default-only and 0061_full as skipped\n- refresh handoff notes to reflect the current draft PR state\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- refresh PLAN.md to the current draft PR state\n- record the current branch, CI, counts, and commit stack\n- add a deeper Claude handoff section with resolved and unresolved work\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Codex 向けタスク仕様書。docs/native_time_provenance.md の新規作成を依頼する。 synthetic-time と native-time の区別、exact reproduction に必要な source artifact 条件、解除チェックリストを明文化する作業。

HDL-400 ベンチマークにおける native per-point time と synthetic time の 2系統の違い、exact reproduction に必要な条件、blocking 状態の理由と解除チェックリストを docs/native_time_provenance.md に記載。

Task 2: Expand 1-manifest methods to all dataset windows (~84 new manifests) Task 3: Full ctest pass + implementation quality audit (docs/implementation_notes.md)

- docs/implementation_notes.md: LOC, test coverage, fidelity for all 27 methods - Full ctest verified: 38/38 passed (49.85s) - All Python scripts pass syntax check

- Add expand_manifests.py to auto-generate missing window manifests - 162 new manifests created, all 330 validated as correct JSON - Every LiDAR method now covers Istanbul/HDL-400/MCD/KITTI windows - Handles KITTI-profile args, no-gt-seed, reference_role correctly

- 18 MCD manifests (6 methods × 3 windows) - 16 KITTI 200f manifests (8 methods × 2 windows) - 40 HDL-400 manifests (20 methods × 2 windows) - Move 80 data-less manifests to experiments/pending/ - Full docs refresh: 248 ready + 1 blocked + 1 skipped

rsasaki0109 · 2026-04-13T15:56:08Z

Benchmark Visualizations

1. Pareto Front: ATE vs FPS (248 ready defaults)

2. Default Variant Instability

3. Core Methods Comparison

rsasaki0109 · 2026-04-13T16:07:21Z

Benchmark Plots

Pareto Front (248 defaults)

Default Variant Instability

Core Methods

… labels

rsasaki0109 · 2026-04-13T16:51:33Z

Updated Plots (label clutter fixed)

Pareto Front (248 defaults, 27 methods color-coded)

Default Variant Instability

Core Methods

rsasaki0109 · 2026-04-13T16:57:15Z

Updated: Clean Pareto Plot (1 point per method)

Best Default per Method (27 methods)

Default Variant Instability

rsasaki0109 added 13 commits April 8, 2026 22:38

Split public ROS1 HDL-400 CT benchmarks

e4d9af4

Add CLINS public benchmark evidence

6d655d0

- add CLINS fast/dense evaluator profiles\n- add a public ROS1 HDL-400 synth-time CLINS manifest\n- refresh aggregates, docs, paper assets, and handoff notes\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Expand Claude handoff plan

e5e6bff

- refresh PLAN.md to the current draft PR state\n- record the current branch, CI, counts, and commit stack\n- add a deeper Claude handoff section with resolved and unresolved work\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add Codex task specs for method expansion and test verification

027910c

Task 2: Expand 1-manifest methods to all dataset windows (~84 new manifests) Task 3: Full ctest pass + implementation quality audit (docs/implementation_notes.md)

Add implementation quality audit: 27 methods, 38/38 tests passed

eebb4a9

- docs/implementation_notes.md: LOC, test coverage, fidelity for all 27 methods - Full ctest verified: 38/38 passed (49.85s) - All Python scripts pass syntax check

Update PLAN.md: 248 ready, 250 active manifests, 38/38 tests pass

8a4fe56

rsasaki0109 changed the title ~~Expand experiment profiles and refresh study docs~~ 27 methods × 8 windows variant-first benchmark (248 ready problems) Apr 13, 2026

Improve Pareto plot readability: per-method colors, Pareto-front-only…

c21c426

… labels

Clean Pareto plot: 1 point per method, log scale, adjustText labels

13c5d3a

rsasaki0109 marked this pull request as ready for review April 14, 2026 01:50

rsasaki0109 merged commit c0ee1e6 into main Apr 14, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

27 methods × 8 windows variant-first benchmark (248 ready problems)#1

27 methods × 8 windows variant-first benchmark (248 ready problems)#1
rsasaki0109 merged 15 commits into
mainfrom
wip/profile-expansion-refresh

rsasaki0109 commented Apr 9, 2026 •

edited

Loading

Uh oh!

rsasaki0109 commented Apr 13, 2026

Uh oh!

rsasaki0109 commented Apr 13, 2026

Uh oh!

rsasaki0109 commented Apr 13, 2026

Uh oh!

rsasaki0109 commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rsasaki0109 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Results

Pareto Front: Accuracy vs Throughput (248 defaults)

Default Variant Instability Across Datasets

Variant Fronts by Method Family

GT-Seed Ablation Finding

What's Included

Uh oh!

rsasaki0109 commented Apr 13, 2026

Benchmark Visualizations

1. Pareto Front: ATE vs FPS (248 ready defaults)

2. Default Variant Instability

3. Core Methods Comparison

Uh oh!

rsasaki0109 commented Apr 13, 2026

Benchmark Plots

Pareto Front (248 defaults)

Default Variant Instability

Core Methods

Uh oh!

rsasaki0109 commented Apr 13, 2026

Updated Plots (label clutter fixed)

Pareto Front (248 defaults, 27 methods color-coded)

Default Variant Instability

Core Methods

Uh oh!

rsasaki0109 commented Apr 13, 2026

Updated: Clean Pareto Plot (1 point per method)

Best Default per Method (27 methods)

Default Variant Instability

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rsasaki0109 commented Apr 9, 2026 •

edited

Loading