-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity box coder, similarity calculator, target assigner #8962
Changes from 1 commit
0006ba7
c99e578
c594cec
7359586
9bd3fe6
ceb406b
c95500c
4753d6e
55db4ad
70cb851
322d444
ba65cc7
7b165eb
3564e7c
a679728
8948ba3
5f71a45
7723b20
980d176
dabfc27
f0bc684
4022aae
22b5b0c
245e9d1
d31aba8
356c98b
824b70f
c8cd7d1
43eaeb0
98516e5
4f7965f
e0b082e
ab96cb3
8e77b75
3d757d5
4f135c7
d54c86d
e09e056
8f5ed2d
656ec2a
9d4b102
323ea89
a6f36d2
111c9d3
de3a34b
e350c59
1ed7ef3
0bc599e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
|
||
"""Faster RCNN box coder. | ||
|
||
Faster RCNN box coder follows the coding schema described below: | ||
ty = (y - ya) / ha | ||
tx = (x - xa) / wa | ||
th = log(h / ha) | ||
tw = log(w / wa) | ||
where x, y, w, h denote the box's center coordinates, width and height | ||
respectively. Similarly, xa, ya, wa, ha denote the anchor's center | ||
coordinates, width and height. tx, ty, tw and th denote the anchor-encoded | ||
center, width and height respectively. | ||
|
||
See http://arxiv.org/abs/1506.01497 for details. | ||
""" | ||
|
||
import tensorflow.compat.v1 as tf | ||
|
||
from object_detection.core import box_coder | ||
from object_detection.core import box_list | ||
|
||
EPSILON = 1e-8 | ||
|
||
|
||
class DETRBoxCoder(box_coder.BoxCoder): | ||
"""Faster RCNN box coder.""" | ||
|
||
def __init__(self, scale_factors=None): | ||
"""Constructor for FasterRcnnBoxCoder. | ||
|
||
Args: | ||
scale_factors: List of 4 positive scalars to scale ty, tx, th and tw. | ||
If set to None, does not perform scaling. For Faster RCNN, | ||
the open-source implementation recommends using [10.0, 10.0, 5.0, 5.0]. | ||
""" | ||
if None: | ||
assert len(scale_factors) == 4 | ||
for scalar in scale_factors: | ||
assert scalar > 0 | ||
self._scale_factors = scale_factors | ||
|
||
@property | ||
def code_size(self): | ||
return 4 | ||
|
||
def _encode(self, boxes, anchors): | ||
"""Encode a box collection with respect to anchor collection. | ||
|
||
Args: | ||
boxes: BoxList holding N boxes to be encoded. | ||
anchors: BoxList of anchors. | ||
|
||
Returns: | ||
a tensor representing N anchor-encoded boxes of the format | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are there references to anchors and Faster-RCNN for the DETR? As far as I understand, one of the big advantages of the model is the simplification e.g. through dropping anchors? Is there any requirement to utilize There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason the box coder is left in for now is because the target assigner uses it to convert the groundtruth to the right format, and no need to change the internals of the target assigner whereas this gives some flexibility. All this box coder does is use the identity function so it can plug into the framework nicely. We may choose to remove it, but that's the reason it's here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, thanks for the explanation. Then, I would adjust the docstrings and remove all the Faster RCNN references e.g. in the constructor or line 40. Just to avoid the confusion with anchors etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do, thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Box Coders have been removed. |
||
[ty, tx, th, tw]. | ||
""" | ||
# Convert anchors to the center coordinate representation. | ||
ycenter, xcenter, h, w = boxes.get_center_coordinates_and_sizes() | ||
# Avoid NaN in division and log below. | ||
h += EPSILON | ||
w += EPSILON | ||
|
||
tx = xcenter | ||
ty = ycenter | ||
tw = w #tf.log(w) | ||
th = h #tf.log(h) | ||
|
||
return tf.transpose(tf.stack([ty, tx, th, tw])) | ||
|
||
def _decode(self, rel_codes, anchors): | ||
"""Decode relative codes to boxes. | ||
|
||
Args: | ||
rel_codes: a tensor representing N anchor-encoded boxes. | ||
anchors: BoxList of anchors. | ||
|
||
Returns: | ||
boxes: BoxList holding N bounding boxes. | ||
""" | ||
ty, tx, th, tw = tf.unstack(tf.transpose(rel_codes)) | ||
|
||
w = tw | ||
h = th | ||
ycenter = ty | ||
xcenter = tx | ||
ymin = ycenter - h / 2. | ||
xmin = xcenter - w / 2. | ||
ymax = ycenter + h / 2. | ||
xmax = xcenter + w / 2. | ||
return box_list.BoxList(tf.transpose(tf.stack([ymin, xmin, ymax, xmax]))) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -446,7 +446,7 @@ def create_target_assigner(reference, stage=None, | |
elif reference == 'DETR': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't need to add it here. In DETR meta architecture you can directly create |
||
similarity_calc = sim_calc.DETRSimilarity() | ||
matcher = hungarian_matcher.HungarianBipartiteMatcher() | ||
box_coder_instance = None | ||
box_coder_instance = detr_box_coder.DETRBoxCoder() | ||
|
||
else: | ||
raise ValueError('No valid combination of reference and stage.') | ||
|
@@ -481,6 +481,7 @@ def batch_assign(target_assigner, | |
function (which have shape [num_gt_boxes, d_1, d_2, ..., d_k]). | ||
gt_weights_batch: A list of 1-D tf.float32 tensors of shape | ||
[num_boxes] containing weights for groundtruth boxes. | ||
class_predictions: A | ||
|
||
Returns: | ||
batch_cls_targets: a tensor with shape [batch_size, num_anchors, | ||
|
@@ -521,7 +522,10 @@ def batch_assign(target_assigner, | |
match_list = [] | ||
if gt_weights_batch is None: | ||
gt_weights_batch = [None] * len(gt_class_targets_batch) | ||
class_predictions = tf.unstack(class_predictions) | ||
if class_predictions: | ||
class_predictions = tf.unstack(class_predictions) | ||
else: | ||
class_predictions = [None] * len(gt_class_targets_batch) | ||
for anchors, gt_boxes, gt_class_targets, gt_weights, class_preds in zip( | ||
anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch, | ||
class_predictions): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, it might be best to follow CenterNet's pattern here to create a single class that does the target assignment.