Github
Sparse4D v1: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion
Sparse4D v2: Recurrent Temporal Fusion with Sparse Model
Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
Chinese Interpretation of the Papers
【2024/06】Our follow-up project, SparseDrive (an end-to-end planning model based on the sparse framework), has been released!!! arxiv & github.
These experiments were conducted using 8 RTX 3090 GPUs with 24 GB memory.
model | backbone | pretrain | img size | Epoch | Traning | FPS | NDS | mAP | AMOTA | AMOTP | IDS | config | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sparse4D-T4 | Res101 | FCOS3D | 640x1600 | 24 | 2Day5H | 2.9 | 0.5438 | 0.4409 | - | - | - | cfg | ckpt | log |
Sparse4Dv2 | Res50 | ImageNet | 256x704 | 100 | 15H | 20.3 | 0.5384 | 0.4392 | - | - | - | cfg | ckpt | log |
Sparse4Dv2 | Res101 | nuImage | 512x1408 | 100 | 2Day | 8.4 | 0.5939 | 0.5051 | - | - | - | - | - | - |
Sparse4Dv3 | Res50 | ImageNet | 256x704 | 100 | 22H | 19.8 | 0.5637 | 0.4646 | 0.477 | 1.167 | 456 | cfg | ckpt | log |
Sparse4Dv3 | Res101 | nuImage | 512x1408 | 100 | 2Day | 8.2 | 0.623 | 0.537 | 0.567 | 1.027 | 557 | - | - | - |
model | backbone | img size | NDS | mAP | mATE | mASE | mAOE | mAVE | mAAE | AMOTA | AMOTP | IDS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sparse4D-T4 | VoV-99 | 640x1600 | 0.595 | 0.511 | 0.533 | 0.263 | 0.369 | 0.317 | 0.124 | - | - | - |
Sparse4Dv2 | VoV-99 | 640x1600 | 0.638 | 0.556 | 0.462 | 0.238 | 0.328 | 0.264 | 0.115 | - | - | - |
Sparse4Dv3 | VoV-99 | 640x1600 | 0.656 | 0.570 | 0.412 | 0.236 | 0.312 | 0.210 | 0.117 | 0.574 | 0.970 | 669 |
Sparse4Dv3-offline | EVA02-large | 640x1600 | 0.719 | 0.668 | 0.346 | 0.234 | 0.279 | 0.142 | 0.145 | 0.677 | 0.761 | 514 |
PS: In the nuscenes leaderboard, Sparse4Dv3 selected external data=True because the eva02-large pretraining utilized imagenet, object365, and coco, as well as supervised by CLIP. Therefore, we consider using the model pre-trained with eva02 as incorporating external data. However, we did not use external 3D detection data for training. This clarification is provided to facilitate fair comparisons.
@misc{2311.11722,
Author = {Xuewu Lin and Zixiang Pei and Tianwei Lin and Lichao Huang and Zhizhong Su},
Title = {Sparse4D v3: Advancing End-to-End 3D Detection and Tracking},
Year = {2023},
Eprint = {arXiv:2311.11722},
}
@misc{2305.14018,
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
Year = {2023},
Eprint = {arXiv:2305.14018},
}
@misc{2211.10581,
Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
Year = {2022},
Eprint = {arXiv:2211.10581},
}