Skip to content

27 methods × 8 windows variant-first benchmark (248 ready problems)#1

Merged
rsasaki0109 merged 15 commits into
mainfrom
wip/profile-expansion-refresh
Apr 14, 2026
Merged

27 methods × 8 windows variant-first benchmark (248 ready problems)#1
rsasaki0109 merged 15 commits into
mainfrom
wip/profile-expansion-refresh

Conversation

@rsasaki0109
Copy link
Copy Markdown
Owner

@rsasaki0109 rsasaki0109 commented Apr 9, 2026

Summary

27 LiDAR localization methods compared across 4 public dataset families with variant-first benchmarking.

  • 248 ready + 1 blocked + 1 skipped benchmark problems
  • 38 from-scratch paper implementations, 27 integrated into unified CLI
  • 4 dataset families: Istanbul, HDL-400, MCD (Multi-Campus), KITTI Raw
  • GT-seeded vs pure odometry ablation (--no-gt-seed)
  • Full sequence evaluation (up to 703 frames)
  • Docker reproducible build verified
  • 38/38 unit tests pass

Key Results

Pareto Front: Accuracy vs Throughput (248 defaults)

Pareto

Default Variant Instability Across Datasets

Instability

No method has a universal "best" variant — the optimal profile changes depending on dataset characteristics.

Variant Fronts by Method Family

Variant Fronts

GT-Seed Ablation Finding

Method GT-seeded ATE No-GT ATE Comment
LiTAMIN2 1.05 m 122.28 m Diverges without GT
GICP 1.46 m 2.21 m Robust
NDT 0.28 m 122.66 m Diverges without GT
KISS-ICP 2.41 m 2.41 m Pure odometry (unaffected)
CT-ICP 1.66 m 1.66 m Pure odometry (unaffected)

What's Included

  • evaluation/src/pcd_dogfooding.cpp — unified benchmark CLI (27 methods)
  • experiments/*_matrix.json — 250 active + 80 pending manifests
  • experiments/results/ — all aggregate results
  • docs/variant_analysis.md — cross-dataset stability + profile impact analysis
  • docs/implementation_notes.md — 27 methods quality audit
  • docs/native_time_provenance.md — HDL-400 time provenance documentation
  • Dockerfile — reproducible build (Ceres 2.1/2.2 compatible)

- add manifest-level benchmark weighting support\n- expand profile coverage across multiple method manifests\n- refresh aggregates, docs, and paper assets\n- update HDL Graph SLAM short-sequence variants\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- add CLINS fast/dense evaluator profiles\n- add a public ROS1 HDL-400 synth-time CLINS manifest\n- refresh aggregates, docs, paper assets, and handoff notes\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- update README benchmark scope and selector references\n- clarify public ROS1 HDL-400 synth-time versus reference/native-time windows\n- refresh paper outline, claims, and table checklist counts\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- mark the full-sequence HDL Graph SLAM policy as resolved\n- record 0009_full as default-only and 0061_full as skipped\n- refresh handoff notes to reflect the current draft PR state\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- refresh PLAN.md to the current draft PR state\n- record the current branch, CI, counts, and commit stack\n- add a deeper Claude handoff section with resolved and unresolved work\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codex 向けタスク仕様書。docs/native_time_provenance.md の新規作成を依頼する。
synthetic-time と native-time の区別、exact reproduction に必要な
source artifact 条件、解除チェックリストを明文化する作業。
HDL-400 ベンチマークにおける native per-point time と synthetic time の
2系統の違い、exact reproduction に必要な条件、blocking 状態の理由と
解除チェックリストを docs/native_time_provenance.md に記載。
Task 2: Expand 1-manifest methods to all dataset windows (~84 new manifests)
Task 3: Full ctest pass + implementation quality audit (docs/implementation_notes.md)
- docs/implementation_notes.md: LOC, test coverage, fidelity for all 27 methods
- Full ctest verified: 38/38 passed (49.85s)
- All Python scripts pass syntax check
- Add expand_manifests.py to auto-generate missing window manifests
- 162 new manifests created, all 330 validated as correct JSON
- Every LiDAR method now covers Istanbul/HDL-400/MCD/KITTI windows
- Handles KITTI-profile args, no-gt-seed, reference_role correctly
- 18 MCD manifests (6 methods × 3 windows)
- 16 KITTI 200f manifests (8 methods × 2 windows)
- 40 HDL-400 manifests (20 methods × 2 windows)
- Move 80 data-less manifests to experiments/pending/
- Full docs refresh: 248 ready + 1 blocked + 1 skipped
@rsasaki0109 rsasaki0109 changed the title Expand experiment profiles and refresh study docs 27 methods × 8 windows variant-first benchmark (248 ready problems) Apr 13, 2026
@rsasaki0109
Copy link
Copy Markdown
Owner Author

Benchmark Visualizations

1. Pareto Front: ATE vs FPS (248 ready defaults)

Pareto


2. Default Variant Instability

Instability


3. Core Methods Comparison

Core

@rsasaki0109
Copy link
Copy Markdown
Owner Author

Benchmark Plots

Pareto Front (248 defaults)

Default Variant Instability

Core Methods

@rsasaki0109
Copy link
Copy Markdown
Owner Author

Updated Plots (label clutter fixed)

Pareto Front (248 defaults, 27 methods color-coded)

Default Variant Instability

Core Methods

@rsasaki0109
Copy link
Copy Markdown
Owner Author

Updated: Clean Pareto Plot (1 point per method)

Best Default per Method (27 methods)

Default Variant Instability

@rsasaki0109 rsasaki0109 marked this pull request as ready for review April 14, 2026 01:50
@rsasaki0109 rsasaki0109 merged commit c0ee1e6 into main Apr 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant