Tianwei Yin, Xingyi Zhou, Philipp Krahenbuhl (UT Austin), CVPR 2021
This paper proposes that center-based 3D object detection and tracking is well performing over 3D bounding box representations. And how center based method detection and tracking is simple. This paper outperforms on Waymo Open Dataset and ranks first among all Lidar-only submissions.
3D objects commonly represented as 3D boxes in point-cloud but this has many challenges. Like , point-clouds are sparse, and most regions of 3D space are without measurements, the resulting output three dimensional box is not well aligned with any global coordinate frame and objects having large size, shapes and aspect ratios. So In this paper, we will represent, detect, and track 3D objects as points. This has many advantages - unlike bounding boxes, points have no intrinsic orientation, a center-based representation simplifies downstream tasks such as tracking. Now our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model.
- First get 3D point cloud data from LiDAR sensors.CenterPoint uses a standard Lidar-based backbone network, i.e., VoxelNet or PointPillars, to build a flattening map view of the input point cloud data. I.e projecting 3D data on a 2D plane. Now this will be treated as a regular 2D image.
- Now keypoint detector or in this case we will be using CenterNet algorithm which will takes an input image and predicts a w × h heatmap Ŷ ∈ [0, 1]w×h×K for each of K classes. Each local maximum in the output heatmap corresponds to the center of a detected object and also detect object size, rotation, and velocity using center features
- Now, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective
Method | mAP ↑ | NDS ↑ | PKL ↓ |
---|---|---|---|
PointPillars | 40.1 | 55.0 | 1.00 |
CVCNet | 55.3 | 64.4 | 0.92 |
CBGS | 52.8 | 63.3 | 0.77 |
PointPainting | 46.4 | 58.1 | 0.89 |
Ours | 58.0 | 65.5 | 0.69 |
Method | MOTA ↑ | MOTP ↓ | ||
---|---|---|---|---|
Vehicle | Ped | Vehicle | Ped | |
AB3D | 42.5 | 38.9 | 18.6 | 34.0 |
Ours | 62.6 | 58.3 | 16.3 | 31.1 |
Method | ΔMOTA↑ | FP↓ | FN↓ | IDS↓ |
---|---|---|---|---|
AB3D | 15.1 | 15088 | 75730 | 9027 |
Chiu et al. | 55.0 | 17533 | 33216 | 950 |
Ours | 63.8 | 18612 | 22928 | 760 |
Encoder | Method | Vehicle | Pedestrian | mAPH |
---|---|---|---|---|
VoxelNet | Anchor-based | 66.1 | 54.4 | 60.3 |
Center-based | 66.5 | 62.7 | 64.6 | |
PointPillars | Anchor-based | 64.1 | 50.8 | 57.5 |
Center-based | 66.5 | 57.4 | 62.0 |
Simple: It use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction with refinement, and tracking is a closest-distance matching.
Fast and Accurate: Our best single model achieves 71.9 mAPH on Waymo and 65.5 NDS on nuScenes while running at 11FPS+.
Paper:https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking
Code and pretrained models:https://github.com/tianweiy/CenterPoint