Add SSD architecture with VGG16 backbone #3403

datumbox · 2021-02-15T17:14:31Z

Resolves #440 and partially resolves #1422

This PR implements SSD with VGG16 backbone as described in the original paper.

Trained using the code committed at a167edc. The current best pre-trained model was trained with (using latest git hash):

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
    --dataset coco --model ssd300_vgg16 --epochs 120\
    --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
    --weight-decay 0.0005 --data-augmentation ssd

Submitted batch job 40773612

Accuracy metrics:

Epoch 119:
0: IoU metric: bbox
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.251
0:  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.415
0:  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.262
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.435
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.239
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.344
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.365
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.602

Validated with:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py\
   --dataset coco --model ssd300_vgg16 --pretrained --test-only

Speed benchmark:
0.83 sec per image on CPU

codecov · 2021-02-15T17:46:02Z

Codecov Report

Merging #3403 (eed06f4) into master (ad9cc62) will decrease coverage by 1.38%.
The diff coverage is 9.04%.

❗ Current head eed06f4 differs from pull request most recent head 5a2e22c. Consider uploading reports for the commit 5a2e22c to get more accurate results

@@            Coverage Diff             @@
##           master    #3403      +/-   ##
==========================================
- Coverage   79.73%   78.34%   -1.39%     
==========================================
  Files         105      106       +1     
  Lines        9818    10003     +185     
  Branches     1579     1614      +35     
==========================================
+ Hits         7828     7837       +9     
- Misses       1513     1688     +175     
- Partials      477      478       +1

Impacted Files	Coverage Δ
torchvision/models/detection/ssd.py	`0.00% <0.00%> (ø)`
torchvision/models/detection/anchor_utils.py	`63.63% <13.04%> (-31.04%)`	⬇️
torchvision/models/detection/retinanet.py	`90.27% <100.00%> (+0.03%)`	⬆️
torchvision/models/detection/transform.py	`78.57% <100.00%> (+0.11%)`	⬆️
torchvision/transforms/functional_pil.py	`68.12% <0.00%> (-1.14%)`	⬇️
torchvision/datasets/voc.py	`94.44% <0.00%> (-0.24%)`	⬇️
torchvision/__init__.py	`61.11% <0.00%> (ø)`
torchvision/ops/boxes.py	`91.26% <0.00%> (ø)`
torchvision/io/__init__.py	`86.20% <0.00%> (ø)`
torchvision/ops/roi_align.py	`70.96% <0.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad9cc62...5a2e22c. Read the comment docs.

…models/ssd

- Skeleton for Default Boxes generator class - Dynamic estimation of configuration when possible - Addition of types

NicolasHug · 2021-03-31T09:04:41Z

@datumbox @fmassa , following the discussion on #3611 (comment): should we start introducing every attribute, function, method and class here with a leading underscore (apart from those we want to explicitly expose)? I would go as far as to also introduce new file names with underscores, so that the public API is very clear: every object that isn't explicitly exposed in an __init__ file somewhere is private (because it can't be imported without a leading underscore somewhere).

It takes a bit of self discipline but it can be very helpful to us in the long-run, to avoid issues like the one in #3611 where we unfortunately can't make a seemingly harmless change. Leading underscores also have somewhat of a self-documenting flavour which is helpful when reading / reviewing code

…e the catch of exception for eval only.

references/detection/presets.py

fmassa

Looks great to me, thanks a lot for all your work Vasilis!

I've only one minor comment regarding the doc, otherwise good to merge!

torchvision/models/detection/ssd.py

fmassa · 2021-04-30T15:20:22Z

torchvision/models/detection/transform.py

@@ -65,14 +73,16 @@ class GeneralizedRCNNTransform(nn.Module):
    It returns a ImageList for the inputs, and a List[Dict[Tensor]] for the targets
    """

-    def __init__(self, min_size, max_size, image_mean, image_std):
+    def __init__(self, min_size, max_size, image_mean, image_std, size_divisible=32, fixed_size=None):


The way I see this in the future is that we will have different transform for different models

SSDTransform(...) GeneralizedRCNNTransform(...) DETRTranform(...)

and the way to avoid too much code duplication will be by having nice abstractions for the joint transforms, so that each one of those will be able to be easily implemented. Something like

GeneralizedRCNNTransform = Compose(ResizeWithMinSize(...), RandomFlip(...), Normalize(...),)

But we are not there yet

fmassa · 2021-04-30T15:25:08Z

references/detection/transforms.py

+                boxes = target["boxes"][is_within_crop_area]
+                ious = torchvision.ops.boxes.box_iou(boxes, torch.tensor([[left, top, right, bottom]],
+                                                                         dtype=boxes.dtype, device=boxes.device))
+                if ious.max() < min_jaccard_overlap:
+                    continue


I double-checked the logic and it seems good to me.

For the future, we might be able to avoid some of the excessive continue by more carefully selecting the sampling.

For example, in the first block we can sample the aspect ratio in log-scale so that the aspect ration will be correct from the beginning, and then sample one value for the scale.
The same can be done for the crop (sampling so that none of the values are zero after rounding).

Yeap, this can definitely be improved. I've implemented it straight as originally described and cross-referencing it with the original implementation to be sure as many similar implementations online are bugged. I would not touch this until there are proper unit-tests in place to ensure we maintain the same behaviour as this transform was crucial for hitting the accuracy reported on the paper.

Summary: * Early skeleton of API. * Adding MultiFeatureMap and vgg16 backbone. * Making vgg16 backbone same as paper. * Making code generic to support all vggs. * Moving vgg's extra layers a separate class + L2 scaling. * Adding header vgg layers. * Fix maxpool patching. * Refactoring code to allow for support of different backbones & sizes: - Skeleton for Default Boxes generator class - Dynamic estimation of configuration when possible - Addition of types * Complete the implementation of DefaultBox generator. * Replace randn with empty. * Minor refactoring * Making clamping between 0 and 1 optional. * Change xywh to xyxy encoding. * Adding parameters and reusing objects in constructor. * Temporarily inherit from Retina to avoid dup code. * Implement forward methods + temp workarounds to inherit from retina. * Inherit more methods from retinanet. * Fix type error. * Add Regression loss. * Fixing JIT issues. * Change JIT workaround to minimize new code. * Fixing initialization bug. * Add classification loss. * Update todos. * Add weight loading support. * Support SSD512. * Change kernel_size to get output size 1x1 * Add xavier init and refactoring. * Adding unit-tests and fixing JIT issues. * Add a test for dbox generator. * Remove unnecessary import. * Workaround on GeneralizedRCNNTransform to support fixed size input. * Remove unnecessary random calls from the test. * Remove more rand calls from the test. * change mapping and handling of empty labels * Fix JIT warnings. * Speed up loss. * Convert 0-1 dboxes to original size. * Fix warning. * Fix tests. * Update comments. * Fixing minor bugs. * Introduce a custom DBoxMatcher. * Minor refactoring * Move extra layer definition inside feature extractor. * handle no bias on init. * Remove fixed image size limitation * Change initialization values for bias of classification head. * Refactoring and update test file. * Adding ResNet backbone. * Minor refactoring. * Remove inheritance of retina and general refactoring. * SSD should fix the input size. * Fixing messages and comments. * Silently ignoring exception if test-only. * Update comments. * Update regression loss. * Restore Xavier init everywhere, update the negative sampling method, change the clipping approach. * Fixing tests. * Refactor to move the losses from the Head to the SSD. * Removing resnet50 ssd version. * Adding support for best performing backbone and its config. * Refactor and clean up the API. * Fix lint * Update todos and comments. * Adding RandomHorizontalFlip and RandomIoUCrop transforms. * Adding necessary checks to our tranforms. * Adding RandomZoomOut. * Adding RandomPhotometricDistort. * Moving Detection transforms to references. * Update presets * fix lint * leave compose and object * Adding scaling for completeness. * Adding params in the repr * Remove unnecessary import. * minor refactoring * Remove unnecessary call. * Give better names to DBox* classes * Port num_anchors estimation in generator * Remove rescaling and fix presets * Add the ability to pass a custom head and refactoring. * fix lint * Fix unit-test * Update todos. * Change mean values. * Change the default parameter of SSD to train the full VGG16 and remove the catch of exception for eval only. * Adding documentation * Adding weights and updating readmes. * Update the model weights with a more performing model. * Adding doc for head. * Restore import. Reviewed By: NicolasHug Differential Revision: D28169152 fbshipit-source-id: cec34141fad09538e0a29c6eb7834b24e2d8528e

xiaohu2015 · 2021-11-30T13:32:19Z

@datumbox @oke-aditya hi, a question about the augmention (https://github.com/pytorch/vision/blob/main/references/detection/transforms.py) of SSD:

class RandomZoomOut(nn.Module):
    def __init__(
        self, fill: Optional[List[float]] = None, side_range: Tuple[float, float] = (1.0, 4.0), p: float = 0.5
    ):
        super().__init__()
        if fill is None:
            fill = [0.0, 0.0, 0.0]
        self.fill = fill
        self.side_range = side_range
        if side_range[0] < 1.0 or side_range[0] > side_range[1]:
            raise ValueError(f"Invalid canvas side range provided {side_range}.")
        self.p = p

    @torch.jit.unused
    def _get_fill_value(self, is_pil):
        # type: (bool) -> int
        # We fake the type to make it work on JIT
        return tuple(int(x) for x in self.fill) if is_pil else 0

    def forward(
        self, image: Tensor, target: Optional[Dict[str, Tensor]] = None
    ) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
        if isinstance(image, torch.Tensor):
            if image.ndimension() not in {2, 3}:
                raise ValueError(f"image should be 2/3 dimensional. Got {image.ndimension()} dimensions.")
            elif image.ndimension() == 2:
                image = image.unsqueeze(0)

        if torch.rand(1) < self.p:
            return image, target

        orig_w, orig_h = F.get_image_size(image)

        r = self.side_range[0] + torch.rand(1) * (self.side_range[1] - self.side_range[0])
        canvas_width = int(orig_w * r)
        canvas_height = int(orig_h * r)

        r = torch.rand(2)
        left = int((canvas_width - orig_w) * r[0])
        top = int((canvas_height - orig_h) * r[1])
        right = canvas_width - (left + orig_w)
        bottom = canvas_height - (top + orig_h)

        if torch.jit.is_scripting():
            fill = 0
        else:
            fill = self._get_fill_value(F._is_pil_image(image))

        image = F.pad(image, [left, top, right, bottom], fill=fill)
        # maybe the following code is redundant?
        if isinstance(image, torch.Tensor):
            v = torch.tensor(self.fill, device=image.device, dtype=image.dtype).view(-1, 1, 1)
            image[..., :top, :] = image[..., :, :left] = image[..., (top + orig_h) :, :] = image[
                ..., :, (left + orig_w) :
            ] = v

        if target is not None:
            target["boxes"][:, 0::2] += left
            target["boxes"][:, 1::2] += top

        return image, target

since the operation of F.pad has pad the image, why you do another fill operation for torch.Tensor?

Early skeleton of API.

db93f46

facebook-github-bot added the cla signed label Feb 15, 2021

datumbox and others added 6 commits February 28, 2021 22:35

Merge branch 'master' into models/ssd

ebfd624

Adding MultiFeatureMap and vgg16 backbone.

b2e42bb

Merge branch 'master' into models/ssd

6cfa98c

Merge branch 'models/ssd' of https://github.com/datumbox/vision into …

80da3b9

…models/ssd

Making vgg16 backbone same as paper.

9779324

Making code generic to support all vggs.

bffe4bc

datumbox force-pushed the models/ssd branch from dc4924b to 03bc52c Compare March 8, 2021 15:55

Moving vgg's extra layers a separate class + L2 scaling.

eced9f0

datumbox force-pushed the models/ssd branch from 03bc52c to eced9f0 Compare March 8, 2021 15:56

datumbox marked this pull request as draft March 8, 2021 16:28

datumbox and others added 12 commits March 8, 2021 17:54

Adding header vgg layers.

869ede4

Fix maxpool patching.

c5ba9c1

Refactoring code to allow for support of different backbones & sizes:

c91bfae

- Skeleton for Default Boxes generator class - Dynamic estimation of configuration when possible - Addition of types

Complete the implementation of DefaultBox generator.

3820e09

Replace randn with empty.

044d178

Minor refactoring

6a7b9b4

Making clamping between 0 and 1 optional.

e85e631

Merge branch 'master' into models/ssd

c574878

Change xywh to xyxy encoding.

327e004

Adding parameters and reusing objects in constructor.

11c9839

Temporarily inherit from Retina to avoid dup code.

34237e4

Implement forward methods + temp workarounds to inherit from retina.

d3f345e

datumbox force-pushed the models/ssd branch from 22d1dcd to d3f345e Compare March 12, 2021 21:37

datumbox and others added 3 commits March 12, 2021 21:40

Inherit more methods from retinanet.

ac25158

Merge branch 'master' into models/ssd

c9c8148

Fix type error.

eed06f4

datumbox and others added 8 commits April 28, 2021 18:02

Remove rescaling and fix presets

8942dd0

Add the ability to pass a custom head and refactoring.

517c1da

fix lint

2deb51e

Merge branch 'master' into models/ssd

937fde8

Fix unit-test

02a2af5

Update todos.

2befe43

Change mean values.

a167edc

Change the default parameter of SSD to train the full VGG16 and remov…

7c4d70d

…e the catch of exception for eval only.

zhiqwang reviewed Apr 29, 2021

View reviewed changes

references/detection/presets.py Show resolved Hide resolved

Adding documentation

a62d4e6

datumbox mentioned this pull request Apr 29, 2021

Remove _has_warned attribute from detection models #3752

Closed

3 tasks

Adding weights and updating readmes.

bc8063a

datumbox changed the title ~~[WIP] Add SSD architecture~~ Add SSD architecture Apr 29, 2021

datumbox and others added 3 commits April 29, 2021 12:53

Merge branch 'master' into models/ssd

b2d5ec9

Update the model weights with a more performing model.

4760197

Merge branch 'master' into models/ssd

3dce96d

datumbox requested a review from fmassa April 30, 2021 08:33

datumbox changed the title ~~Add SSD architecture~~ Add SSD architecture with VGG16 backbone Apr 30, 2021

fmassa approved these changes Apr 30, 2021

View reviewed changes

datumbox and others added 3 commits April 30, 2021 16:32

Adding doc for head.

365d1ef

Merge branch 'master' into models/ssd

06477d6

Restore import.

6c94ff0

datumbox force-pushed the models/ssd branch from 816dfe1 to 6c94ff0 Compare April 30, 2021 15:53

datumbox merged commit 730c5e1 into pytorch:master Apr 30, 2021

datumbox deleted the models/ssd branch April 30, 2021 16:30

zhiqwang mentioned this pull request May 8, 2021

Refactor grid default boxes with torch meshgrid #3799

Merged

oke-aditya mentioned this pull request May 15, 2021

Are new models planned to be added? #2707

Open

37 tasks

datumbox mentioned this pull request Jan 25, 2022

Fix a bug and document RandomZoomOut #5278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SSD architecture with VGG16 backbone #3403

Add SSD architecture with VGG16 backbone #3403

Uh oh!

datumbox commented Feb 15, 2021 •

edited

Loading

Uh oh!

codecov bot commented Feb 15, 2021 •

edited

Loading

Uh oh!

NicolasHug commented Mar 31, 2021

Uh oh!

Uh oh!

fmassa left a comment

Uh oh!

Uh oh!

fmassa Apr 30, 2021

Uh oh!

fmassa Apr 30, 2021

Uh oh!

datumbox Apr 30, 2021

Uh oh!

xiaohu2015 commented Nov 30, 2021 •

edited

Loading

Uh oh!

Uh oh!

Add SSD architecture with VGG16 backbone #3403

Add SSD architecture with VGG16 backbone #3403

Uh oh!

Conversation

datumbox commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NicolasHug commented Mar 31, 2021

Uh oh!

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

fmassa Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

datumbox Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

xiaohu2015 commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

datumbox commented Feb 15, 2021 •

edited

Loading

codecov bot commented Feb 15, 2021 •

edited

Loading

xiaohu2015 commented Nov 30, 2021 •

edited

Loading