Sparse4D: Sparse-based End-to-end Multi-view Temporal Perception

Github
Sparse4D v1: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion
Sparse4D v2: Recurrent Temporal Fusion with Sparse Model
Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
Chinese Interpretation of the Papers

Overall Architecture

Overall Framework of Sparse4D, which conforms to an encoder-decoder structure. The inputs mainly consists of three components: multi-view images, newly initialized instances, propagated instances from previous frame. The output is the refined instances (3D anchor boxes and corresponding features), serve as the perception results for the current frame. Additionally, a subset of these refined instances is selected and propagated to the next frame.

Illustration of our Efficient Deformable Aggregation Module. (a) The basic pipeline: we first generate multiple 3D key points inside 3D anchor, then sampling multi-scale/view image feature for each keypoint, and fuse these feature with predicted weight. (b) The parallel implementation: to further improve speed and reduce memory cost, we achieve a parallel implementation, where feature sampling and multi-view/scale weighted sum are combined as a CUDA operation. Our CUDA implementation supports handling different feature resolutions from different views.

nuScenes Benchmark

Results on Validation Split

These experiments were conducted using 8 RTX 3090 GPUs with 24 GB memory.

model	backbone	pretrain	img size	Epoch	Traning	FPS	NDS	mAP	AMOTA	AMOTP	IDS	config	ckpt	log
Sparse4D-T4	Res101	FCOS3D	640x1600	24	2Day5H	2.9	0.5438	0.4409	-	-	-	cfg	ckpt	log
Sparse4Dv2	Res50	ImageNet	256x704	100	15H	20.3	0.5384	0.4392	-	-	-	cfg	ckpt	log
Sparse4Dv2	Res101	nuImage	512x1408	100	2Day	8.4	0.5939	0.5051	-	-	-	-	-	-
Sparse4Dv3	Res50	ImageNet	256x704	100	22H	19.8	0.5637	0.4646	0.477	1.167	456	cfg	ckpt	log
Sparse4Dv3	Res101	nuImage	512x1408	100	2Day	8.2	0.623	0.537	0.567	1.027	557	-	-	-

Results on Test Split

model	backbone	img size	NDS	mAP	mATE	mASE	mAOE	mAVE	mAAE	AMOTA	AMOTP	IDS
Sparse4D-T4	VoV-99	640x1600	0.595	0.511	0.533	0.263	0.369	0.317	0.124	-	-	-
Sparse4Dv2	VoV-99	640x1600	0.638	0.556	0.462	0.238	0.328	0.264	0.115	-	-	-
Sparse4Dv3	VoV-99	640x1600	0.656	0.570	0.412	0.236	0.312	0.210	0.117	0.574	0.970	669
Sparse4Dv3-offline	EVA02-large	640x1600	0.719	0.668	0.346	0.234	0.279	0.142	0.145	0.677	0.761	514

PS: In the nuscenes leaderboard, Sparse4Dv3 selected external data=True because the eva02-large pretraining utilized imagenet, object365, and coco, as well as supervised by CLIP. Therefore, we consider using the model pre-trained with eva02 as incorporating external data. However, we did not use external 3D detection data for training. This clarification is provided to facilitate fair comparisons.

Quick Start

Citation

@misc{2311.11722,
    Author = {Xuewu Lin and Zixiang Pei and Tianwei Lin and Lichao Huang and Zhizhong Su},
    Title = {Sparse4D v3: Advancing End-to-End 3D Detection and Tracking},
    Year = {2023},
    Eprint = {arXiv:2311.11722},
}
@misc{2305.14018,
    Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
    Title = {Sparse4D v2: Recurrent Temporal Fusion with Sparse Model},
    Year = {2023},
    Eprint = {arXiv:2305.14018},
}
@misc{2211.10581,
    Author = {Xuewu Lin and Tianwei Lin and Zixiang Pei and Lichao Huang and Zhizhong Su},
    Title = {Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion},
    Year = {2022},
    Eprint = {arXiv:2211.10581},
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs		docs
projects		projects
resources		resources
tools		tools
tutorial		tutorial
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
local_test.sh		local_test.sh
local_train.sh		local_train.sh
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse4D: Sparse-based End-to-end Multi-view Temporal Perception

Overall Architecture

nuScenes Benchmark

Results on Validation Split

Results on Test Split

Quick Start

Citation

Acknowledgement

About

Releases

Packages

Languages

License

rqbrother/Sparse4D

Folders and files

Latest commit

History

Repository files navigation

Sparse4D: Sparse-based End-to-end Multi-view Temporal Perception

Overall Architecture

nuScenes Benchmark

Results on Validation Split

Results on Test Split

Quick Start

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages