Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CornerNet: Detecting Objects as Paired Keypoints #50

Open
howardyclo opened this issue Mar 15, 2019 · 1 comment
Open

CornerNet: Detecting Objects as Paired Keypoints #50

howardyclo opened this issue Mar 15, 2019 · 1 comment

Comments

@howardyclo
Copy link
Owner

Metadata

@howardyclo
Copy link
Owner Author

Summary

Contribution

  • Propose a novel formulation for object detection as detecting bounding box as paired keypoints (top-left and bottom-right corner), which does away with anchor boxes.
  • Propose corner pooling for improving corner localization.
  • Modify the hourglass network and add novel variant of focal loss.
  • Achieves 42.1% AP on MS COCO, outperforming all existing one-stage detectors.

Drawback of Using Anchor Boxes in Traditional One-stage Detector

  • Traditional one-stage detector places anchor boxes (predefined candidate boxes with various sizes and aspect ratios) densely over an image and generate final box predictions by scoring anchor boxes and refining their coordinates through regression.
  • Method uses anchor boxes for training will creates a huge imbalance between positive and negative anchor boxes and slows down training (Ref. Focal loss)
  • Introduce hyperparameter choices.
  • Become very complicated when combining with multiscale architectures where a single network makes separate predictions at multiple resolutions, with each scale using different features and its own set of anchor boxes. (SSD, Focal loss, DSSD).

How Does CornerNet Work?


  • Use a single convolutional network to predict:
    • A set of heatmap for the top-left corners of all instances of the same object category.
    • A set of heatmap for the bottom-right corners of ... (same as the above).
    • Note: Each set of heatmaps has C channels, where C is object classes (no background class). Each channel is a binary mask indicating the locations of the corners for a class.
    • An embedding vector for each detected corner (inspired from NIPS'17 associative embedding). The embeddings serve to group a pair of corners that belong to the same object -- the network is trained to predict similar embeddings for them.
    • To produce tighter bounding boxes, the network also predicts offsets to slightly adjust the locations of the corners.

How Does Corner Pooling Work?

  • Motivation: A corner of a bounding box is often outside the object, thus cannot be localize based on local evidence. For the top-most corner, we need to look horizontally towards the right for the topmost boundary of the object, and look vertically towards the bottom for the leftmost boundary.
  • It takes in two feature maps; at each pixel location it max-pools all feature vectors to the right from the first feature map, max-pools all feature vectors directly below from the second feature map, and then adds the two pooled results together.

How Is Corner Pooling Used in CornerNet?

Losses

  1. Corner detection loss:
  2. Corner offset loss:

  3. Grouping corner loss:
  4. Note: they use 0.1 to weight this 2 loss terms, since set to 1 for both leads to poor performance.

Ablation Study

  1. Corner pooling
  2. Reducing penalty to negative locations
  3. Error analysis

Comparisons with State-of-the-art Detectors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant