Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSD architecture with VGG16 backbone #3403

Merged
merged 113 commits into from
Apr 30, 2021
Merged

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Feb 15, 2021

Resolves #440 and partially resolves #1422

This PR implements SSD with VGG16 backbone as described in the original paper.

Trained using the code committed at a167edc. The current best pre-trained model was trained with (using latest git hash):

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
    --dataset coco --model ssd300_vgg16 --epochs 120\
    --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
    --weight-decay 0.0005 --data-augmentation ssd

Submitted batch job 40773612

Accuracy metrics:

Epoch 119:
0: IoU metric: bbox
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.251
0:  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.415
0:  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.262
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.435
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.239
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.344
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.365
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.602

Validated with:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py\
   --dataset coco --model ssd300_vgg16 --pretrained --test-only

Speed benchmark:
0.83 sec per image on CPU

@codecov
Copy link

codecov bot commented Feb 15, 2021

Codecov Report

Merging #3403 (eed06f4) into master (ad9cc62) will decrease coverage by 1.38%.
The diff coverage is 9.04%.

❗ Current head eed06f4 differs from pull request most recent head 5a2e22c. Consider uploading reports for the commit 5a2e22c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3403      +/-   ##
==========================================
- Coverage   79.73%   78.34%   -1.39%     
==========================================
  Files         105      106       +1     
  Lines        9818    10003     +185     
  Branches     1579     1614      +35     
==========================================
+ Hits         7828     7837       +9     
- Misses       1513     1688     +175     
- Partials      477      478       +1     
Impacted Files Coverage Δ
torchvision/models/detection/ssd.py 0.00% <0.00%> (ø)
torchvision/models/detection/anchor_utils.py 63.63% <13.04%> (-31.04%) ⬇️
torchvision/models/detection/retinanet.py 90.27% <100.00%> (+0.03%) ⬆️
torchvision/models/detection/transform.py 78.57% <100.00%> (+0.11%) ⬆️
torchvision/transforms/functional_pil.py 68.12% <0.00%> (-1.14%) ⬇️
torchvision/datasets/voc.py 94.44% <0.00%> (-0.24%) ⬇️
torchvision/__init__.py 61.11% <0.00%> (ø)
torchvision/ops/boxes.py 91.26% <0.00%> (ø)
torchvision/io/__init__.py 86.20% <0.00%> (ø)
torchvision/ops/roi_align.py 70.96% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad9cc62...5a2e22c. Read the comment docs.

@NicolasHug
Copy link
Member

@datumbox @fmassa , following the discussion on #3611 (comment): should we start introducing every attribute, function, method and class here with a leading underscore (apart from those we want to explicitly expose)? I would go as far as to also introduce new file names with underscores, so that the public API is very clear: every object that isn't explicitly exposed in an __init__ file somewhere is private (because it can't be imported without a leading underscore somewhere).

It takes a bit of self discipline but it can be very helpful to us in the long-run, to avoid issues like the one in #3611 where we unfortunately can't make a seemingly harmless change. Leading underscores also have somewhat of a self-documenting flavour which is helpful when reading / reviewing code

@datumbox datumbox changed the title [WIP] Add SSD architecture Add SSD architecture Apr 29, 2021
@datumbox datumbox requested a review from fmassa April 30, 2021 08:33
@datumbox datumbox changed the title Add SSD architecture Add SSD architecture with VGG16 backbone Apr 30, 2021
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks a lot for all your work Vasilis!

I've only one minor comment regarding the doc, otherwise good to merge!

torchvision/models/detection/ssd.py Outdated Show resolved Hide resolved
@@ -65,14 +73,16 @@ class GeneralizedRCNNTransform(nn.Module):
It returns a ImageList for the inputs, and a List[Dict[Tensor]] for the targets
"""

def __init__(self, min_size, max_size, image_mean, image_std):
def __init__(self, min_size, max_size, image_mean, image_std, size_divisible=32, fixed_size=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see this in the future is that we will have different transform for different models

SSDTransform(...)
GeneralizedRCNNTransform(...)
DETRTranform(...)

and the way to avoid too much code duplication will be by having nice abstractions for the joint transforms, so that each one of those will be able to be easily implemented. Something like

GeneralizedRCNNTransform = Compose(ResizeWithMinSize(...), RandomFlip(...), Normalize(...),)

But we are not there yet

Comment on lines +114 to +118
boxes = target["boxes"][is_within_crop_area]
ious = torchvision.ops.boxes.box_iou(boxes, torch.tensor([[left, top, right, bottom]],
dtype=boxes.dtype, device=boxes.device))
if ious.max() < min_jaccard_overlap:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked the logic and it seems good to me.

For the future, we might be able to avoid some of the excessive continue by more carefully selecting the sampling.

For example, in the first block we can sample the aspect ratio in log-scale so that the aspect ration will be correct from the beginning, and then sample one value for the scale.
The same can be done for the crop (sampling so that none of the values are zero after rounding).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, this can definitely be improved. I've implemented it straight as originally described and cross-referencing it with the original implementation to be sure as many similar implementations online are bugged. I would not touch this until there are proper unit-tests in place to ensure we maintain the same behaviour as this transform was crucial for hitting the accuracy reported on the paper.

@datumbox datumbox merged commit 730c5e1 into pytorch:master Apr 30, 2021
@datumbox datumbox deleted the models/ssd branch April 30, 2021 16:30
facebook-github-bot pushed a commit that referenced this pull request May 4, 2021
Summary:
* Early skeleton of API.

* Adding MultiFeatureMap and vgg16 backbone.

* Making vgg16 backbone same as paper.

* Making code generic to support all vggs.

* Moving vgg's extra layers a separate class + L2 scaling.

* Adding header vgg layers.

* Fix maxpool patching.

* Refactoring code to allow for support of different backbones & sizes:
- Skeleton for Default Boxes generator class
- Dynamic estimation of configuration when possible
- Addition of types

* Complete the implementation of DefaultBox generator.

* Replace randn with empty.

* Minor refactoring

* Making clamping between 0 and 1 optional.

* Change xywh to xyxy encoding.

* Adding parameters and reusing objects in constructor.

* Temporarily inherit from Retina to avoid dup code.

* Implement forward methods + temp workarounds to inherit from retina.

* Inherit more methods from retinanet.

* Fix type error.

* Add Regression loss.

* Fixing JIT issues.

* Change JIT workaround to minimize new code.

* Fixing initialization bug.

* Add classification loss.

* Update todos.

* Add weight loading support.

* Support SSD512.

* Change kernel_size to get output size 1x1

* Add xavier init and refactoring.

* Adding unit-tests and fixing JIT issues.

* Add a test for dbox generator.

* Remove unnecessary import.

* Workaround on GeneralizedRCNNTransform to support fixed size input.

* Remove unnecessary random calls from the test.

* Remove more rand calls from the test.

* change mapping and handling of empty labels

* Fix JIT warnings.

* Speed up loss.

* Convert 0-1 dboxes to original size.

* Fix warning.

* Fix tests.

* Update comments.

* Fixing minor bugs.

* Introduce a custom DBoxMatcher.

* Minor refactoring

* Move extra layer definition inside feature extractor.

* handle no bias on init.

* Remove fixed image size limitation

* Change initialization values for bias of classification head.

* Refactoring and update test file.

* Adding ResNet backbone.

* Minor refactoring.

* Remove inheritance of retina and general refactoring.

* SSD should fix the input size.

* Fixing messages and comments.

* Silently ignoring exception if test-only.

* Update comments.

* Update regression loss.

* Restore Xavier init everywhere, update the negative sampling method, change the clipping approach.

* Fixing tests.

* Refactor to move the losses from the Head to the SSD.

* Removing resnet50 ssd version.

* Adding support for best performing backbone and its config.

* Refactor and clean up the API.

* Fix lint

* Update todos and comments.

* Adding RandomHorizontalFlip and RandomIoUCrop transforms.

* Adding necessary checks to our tranforms.

* Adding RandomZoomOut.

* Adding RandomPhotometricDistort.

* Moving Detection transforms to references.

* Update presets

* fix lint

* leave compose and object

* Adding scaling for completeness.

* Adding params in the repr

* Remove unnecessary import.

* minor refactoring

* Remove unnecessary call.

* Give better names to DBox* classes

* Port num_anchors estimation in generator

* Remove rescaling and fix presets

* Add the ability to pass a custom head and refactoring.

* fix lint

* Fix unit-test

* Update todos.

* Change mean values.

* Change the default parameter of SSD to train the full VGG16 and remove the catch of exception for eval only.

* Adding documentation

* Adding weights and updating readmes.

* Update the model weights with a more performing model.

* Adding doc for head.

* Restore import.

Reviewed By: NicolasHug

Differential Revision: D28169152

fbshipit-source-id: cec34141fad09538e0a29c6eb7834b24e2d8528e
@xiaohu2015
Copy link
Contributor

xiaohu2015 commented Nov 30, 2021

@datumbox @oke-aditya hi, a question about the augmention (https://github.com/pytorch/vision/blob/main/references/detection/transforms.py) of SSD:

class RandomZoomOut(nn.Module):
    def __init__(
        self, fill: Optional[List[float]] = None, side_range: Tuple[float, float] = (1.0, 4.0), p: float = 0.5
    ):
        super().__init__()
        if fill is None:
            fill = [0.0, 0.0, 0.0]
        self.fill = fill
        self.side_range = side_range
        if side_range[0] < 1.0 or side_range[0] > side_range[1]:
            raise ValueError(f"Invalid canvas side range provided {side_range}.")
        self.p = p

    @torch.jit.unused
    def _get_fill_value(self, is_pil):
        # type: (bool) -> int
        # We fake the type to make it work on JIT
        return tuple(int(x) for x in self.fill) if is_pil else 0

    def forward(
        self, image: Tensor, target: Optional[Dict[str, Tensor]] = None
    ) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
        if isinstance(image, torch.Tensor):
            if image.ndimension() not in {2, 3}:
                raise ValueError(f"image should be 2/3 dimensional. Got {image.ndimension()} dimensions.")
            elif image.ndimension() == 2:
                image = image.unsqueeze(0)

        if torch.rand(1) < self.p:
            return image, target

        orig_w, orig_h = F.get_image_size(image)

        r = self.side_range[0] + torch.rand(1) * (self.side_range[1] - self.side_range[0])
        canvas_width = int(orig_w * r)
        canvas_height = int(orig_h * r)

        r = torch.rand(2)
        left = int((canvas_width - orig_w) * r[0])
        top = int((canvas_height - orig_h) * r[1])
        right = canvas_width - (left + orig_w)
        bottom = canvas_height - (top + orig_h)

        if torch.jit.is_scripting():
            fill = 0
        else:
            fill = self._get_fill_value(F._is_pil_image(image))

        image = F.pad(image, [left, top, right, bottom], fill=fill)
        # maybe the following code is redundant?
        if isinstance(image, torch.Tensor):
            v = torch.tensor(self.fill, device=image.device, dtype=image.dtype).view(-1, 1, 1)
            image[..., :top, :] = image[..., :, :left] = image[..., (top + orig_h) :, :] = image[
                ..., :, (left + orig_w) :
            ] = v

        if target is not None:
            target["boxes"][:, 0::2] += left
            target["boxes"][:, 1::2] += top

        return image, target

since the operation of F.pad has pad the image, why you do another fill operation for torch.Tensor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

please add other detectors Add SSD in models
6 participants