How to add another detection head ? #1418

Edwardmark · 2020-11-17T06:46:11Z

❔Question

In yolo5s.yaml, there is only three detection layer [P3/8, P4/16, P5/32], how to add another layer with scale 64 to detect really big objects?

Additional context

Could you please kindly give me some guide? Thanks. @glenn-jocher

glenn-jocher · 2020-11-17T12:36:16Z

@Edwardmark design modifications are up to you. Start from the existing yamls and modify as you see fit.
https://github.com/ultralytics/yolov5/tree/master/models

JoJoliking · 2020-11-19T08:00:55Z

@Edwardmark Have you solve the problem ? I also want to add and modify the detection head. And now I can't find the location of the detection head 's code

Edwardmark · 2020-11-20T09:24:10Z

@glenn-jocher Could you please explain the parameters in yolov5l.yaml a little, let's say, if we want to add a head which aims to detect large objects (e.g. 640x640 big objects), what should be added to anchor and backbone and head in yolov5l.yaml?
Thanks. The model definition is hard to understand for me, please help me out, thanks in advance.

glenn-jocher · 2020-11-20T10:05:07Z

@Edwardmark @JoJoliking sure no problem. The current models output P3-P5 layers supporting strides 8-32. You want to export a P6 layer with stride 64.

You can export from any layer of the model you want simply by adding it to the input list of Detect(). This is one of the major advancements we made in YOLOv5 above and beyond the previous cfg architectures:

yolov5/models/yolov5s.yaml

Line 47 in 199c9c7

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

So all you need to do is build the additional structure you want and then add the output layer you want to this list. You could then add another set of P6/64 anchors manually to the model, or you could simply delete the manual anchors and put a number instead, like anchors: 3 to tell the model to compute 3 of it's own anchors at each output.

yolov5/models/yolov5s.yaml

Lines 7 to 11 in 199c9c7

    
           anchors: 
        
             - [10,13, 16,30, 33,23]  # P3/8 
        
             - [30,61, 62,45, 59,119]  # P4/16 
        
             - [116,90, 156,198, 373,326]  # P5/32

To build the additional structure, you can simply repeat the steps from P4 to P5:

yolov5/models/yolov5s.yaml

Lines 38 to 47 in 199c9c7

    
           [-1, 1, Conv, [256, 3, 2]], 
        
           [[-1, 14], 1, Concat, [1]],  # cat head P4 
        
           [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium) 
        
           [-1, 1, Conv, [512, 3, 2]], 
        
           [[-1, 10], 1, Concat, [1]],  # cat head P5 
        
           [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large) 
        
           [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)

In terms of P6, there's no 64-stride layers earlier to concat, so you could simply do something like this for the easiest P6/64 output. If you wanted to get fancier you could have the backbone travel down to P6/64, and then concat that layer with the head (same as P5 is handled).

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [-1, 1, Conv, [1024, 3, 2]],
   [-1, 3, BottleneckCSP, [2048, False]],  # 25 (P6/64-xlarge)

   [[17, 20, 23, 25], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)

glenn-jocher · 2020-11-20T10:10:39Z

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

JoJoliking · 2020-11-20T10:18:16Z

@glenn-jocher Excellent answer.
For different yaml. 5s, 5l and 5x all use YOLOv5 head. For 5-fpn and 5-panet, it corresponds to fpn head and panet head.
These three head structures should be different. There is no definition of fpn and panet under common.py. At the same time, it seems that fpn and panet are not generated in YOLOv5head (I don’t know if I missed it)
right?

glenn-jocher · 2020-11-20T10:29:09Z

@JoJoliking the four YOLOv5 models s/m/l/x are all built from yolov5-panet.yaml with different compound scaling constants. I experimented to find the best constants ratio, starting from the EfficientDet scaling equations, and these are used now for the four sizes.

FPN heads (like in YOLOv3) perform worse and are no longer used, though yolov5-fpn.yaml is archived for historical reasons (and to show how to modify the head structure from FPN to PANet).

glenn-jocher · 2020-11-20T10:30:56Z

Also common.py and experimental.py define low level modules that are used to create FPN or PANet heads. The heads themselves are only created and defined in the yamls.

JoJoliking · 2020-11-21T12:18:29Z

@glenn-jocher All right。 I already know the relationship between the network structure。
By the way，If I want to increase the output of a network dimension, such as 4 offsets of the bounding box, so which places should I modify? How should I add a convolutional layer branch to each of the three detection heads to achieve this? I tried to modify the yaml and Detect functions, but failed. Forgive me for not having a deep understanding of the code. Sorry.

glenn-jocher · 2020-11-21T12:26:55Z

@JoJoliking I don't understand what you are asking.

JoJoliking · 2020-11-21T12:35:16Z

@glenn-jocher
Sorry. I think I should describe my problem more clearly. The current detection output is 85 containing the prediction probability of the 80 category, the xywh of the bounding box and the classification score, right? My idea : Do not change the existing output, while adding a four-dimensional output. They are the corresponding offset of xywh respectively. (Use fully connected layer or convolutional layer to achieve)

glenn-jocher · 2020-11-21T12:46:30Z

The three n-to-255 convolutions are contained inside the Detect() layer, you can apply any modifications you want there.

Though applying offsets/gains to the existing offsets and gains may overdetermine some of the parameters. ie fitting two offsets for one value is not typical in parameter estimation as there is only 1 degree of freedom there.

Edwardmark · 2020-11-23T03:00:53Z

@glenn-jocher Thanks for your kind reply. It helps me a lot.
Best,
Edward.

mary-0830 · 2020-11-26T06:37:45Z

Hello, I would like to ask me to add some anchor box parameters after the anchor attribute in yolov5s.yaml, but an overflow error will be displayed.

Excuse me, is this not allowed? Or is there anything I haven’t changed?

The parameters I added are like this.
`anchors:

[93, 72, 116,90, 125, 158, 156,198, 298, 261, 373,326, 448, 391] # P5/32
[24, 49, 30,61, 50, 36, 62,45, 47, 95, 59,119, 71, 143] # P4/16
[8, 10, 10,13, 13, 24, 16,30, 26, 18, 33,23, 40, 28] # P3/8`

glenn-jocher · 2020-11-26T15:10:47Z

@mary-0830 you're free to modify anchors as you see fit. The only constraint is each output layer requires the same number of anchors.

If autoanchor doesn't like your new anchors, it will create new ones on it's own, based on the number you supplied initially. You can disable autoanchor with python train.py --noautoanchor.

You can also simply specify a number here instead of anchor vectors:
anchors: 3

Edwardmark · 2020-11-27T06:36:03Z

@glenn-jocher If I add head, what shold I modified in the compute_loss funcition? How to set balance? in compute_loss function?

Thanks.

glenn-jocher · 2020-11-27T09:16:19Z

@Edwardmark modifications are up to you.

JoJoliking · 2020-11-27T09:21:19Z

@glenn-jocher Hallow, If I want to load multiple data sets for training at the same time (they will be placed in the same subfolder), then how should I modify the LoadImagesAndLabels function?

glenn-jocher · 2020-11-27T09:24:18Z

@JoJoliking coco128.yaml already explains how to load multiple datasets. Do not modify the code.

yolov5/data/coco128.yaml

Lines 12 to 15 in 97a5227

    
           # train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/] 
        
           train: ../coco128/images/train2017/  # 128 images 
        
           val: ../coco128/images/train2017/  # 128 images

JoJoliking · 2020-11-27T09:28:17Z

OK .I will have a try. Thank you for your previous reply to my question. I have a success.

JoJoliking · 2020-11-28T14:14:49Z

@glenn-jocher
Hallow. Dear YOLO5 Author! If I only want YOLOv5 to recognize human, how should anchor size and ancho_t(default=4.0) be modified? Can you give me some advice ?

glenn-jocher · 2020-11-29T10:13:37Z

@JoJoliking I would recommend training with all default settings (no modification). To start see:
https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

JoJoliking · 2020-11-29T10:15:37Z

@glenn-jocher OK. Thanks .I will try my ideas.

glenn-jocher · 2020-11-29T10:39:08Z

@JoJoliking ok! Also remember COCO models already offer human detection. You can also filter detections by class to only show human detections like this, so in reality I would not even train a new model if all you want is human detection:

python detect.py --classes 0

JoJoliking · 2020-11-29T11:17:40Z

@glenn-jocher
Yes. Dear glenn-jocher
In fact, I will use other human datasets for training. This type human datasets only have one class (zeros for human). At the same time, I notice tha the cls loss is always zeors. I think this is normal, because network only need to distinguish background and person. Right ?

glenn-jocher · 2020-11-29T11:29:29Z

@JoJoliking yes, this is normal. Single-class datasets do not have any classification loss as there is no classification task, only objectness loss.

WANGCHAO1996 · 2021-01-27T11:21:56Z

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

Hello author, I want to add a detection layer for detecting small targets. Now the latest code has been modified, how should I modify it@glenn-jocher

anchors

anchors:

[10,13, 16,30, 33,23] # P3/8
[30,61, 62,45, 59,119] # P4/16
[116,90, 156,198, 373,326] # P5/32

YOLOv5 backbone

backbone:

[from, number, module, args]

[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]], # 9
]

YOLOv5 head

head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, BottleneckCSP, [512, False]], # 13

[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)

[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)

[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

glenn-jocher · 2021-01-27T18:39:29Z

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4):
https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

YukunXia · 2021-08-29T22:05:05Z

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4):
https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

@glenn-jocher

Should this line

yolov5/models/hub/yolov5-p2.yaml

Line 53 in b894e69

[[24, 27, 30], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

add 21 to the anchor list?

Besides, maybe the anchors should add one more line of size definition?

glenn-jocher · 2021-08-30T13:07:19Z

@YukunXia actually yes, maybe the Detect input list should probably include 21 as well. You can train either way, with the current setup using the same output strides, but with the PANet head dipping down into P2 stride convolutions to help add accuracy to the P3 output.

It's been a while since I made this so I can't remember if the 21 omission is intentional or not.

Can you submit a PR with the 21 addition to this yaml? Thanks!

YukunXia · 2021-08-30T13:42:42Z

OK, the PR is submitted.

1db6554

JoJoliking · 2021-08-30T13:48:48Z

@glenn-jocher Dear Author. By the way. Have you tried to use BiFPN in Yolov5 instead of PANet ? My experiments show that BiFPN with stacking 3 Layers can reach a better map on other datasets!

xengst · 2021-11-20T10:52:06Z

@glenn-jocher If I want to add another model e.g VGG16 to the backbone, what is the right way to do that?

glenn-jocher · 2021-11-20T11:11:58Z

@xengst you could try to create backbone modifications in a yaml file, though be aware that the head and backbone are connected at many different places by shortcut connections and not just at the end of the backbone.

xengst · 2021-11-20T11:16:37Z

@glenn-jocher Thank your for your reply.

So I should define all VGG16 layer as class first in common.py then add it to model.yaml?

glenn-jocher · 2021-11-20T11:20:16Z

@xengst yes. Remember the head needs skip connections from P3, P4, P5 (layers 6, 4 and 10 here):

yolov5/models/yolov5s.yaml

Lines 27 to 48 in 5185981

    
           # YOLOv5 v6.0 head 
        
           head: 
        
             [[-1, 1, Conv, [512, 1, 1]], 
        
              [-1, 1, nn.Upsample, [None, 2, 'nearest']], 
        
              [[-1, 6], 1, Concat, [1]],  # cat backbone P4 
        
              [-1, 3, C3, [512, False]],  # 13 
        
              [-1, 1, Conv, [256, 1, 1]], 
        
              [-1, 1, nn.Upsample, [None, 2, 'nearest']], 
        
              [[-1, 4], 1, Concat, [1]],  # cat backbone P3 
        
              [-1, 3, C3, [256, False]],  # 17 (P3/8-small) 
        
              [-1, 1, Conv, [256, 3, 2]], 
        
              [[-1, 14], 1, Concat, [1]],  # cat head P4 
        
              [-1, 3, C3, [512, False]],  # 20 (P4/16-medium) 
        
              [-1, 1, Conv, [512, 3, 2]], 
        
              [[-1, 10], 1, Concat, [1]],  # cat head P5 
        
              [-1, 3, C3, [1024, False]],  # 23 (P5/32-large) 
        
              [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5) 
        
             ]

xengst · 2021-11-20T11:25:53Z

@glenn-jocher So what i am trying to do it wouldn't work ? if I want to test different head and backbone is not possible?

glenn-jocher · 2021-11-20T11:28:06Z

@xengst how would I know if your experiment will 'work' or not?

myasser63 · 2021-12-26T00:42:58Z

@glenn-jocher Can I directly add a layer to Detect from the backbone?

glenn-jocher · 2021-12-27T16:36:28Z

@myasser63 yes Detect can accept inputs from any part of the model. If you update Detect inputs you probably also want to set anchors: 3. This tells AutoAnchor to evolve 3 anchors for each Detect input.

myasser63 · 2021-12-28T06:28:04Z

Thanks @glenn-jocher four your explanation

myasser63 · 2022-01-01T02:21:40Z

@glenn-jocher I want to understant the concept behind choosing ch_out for head Conv layer. Is it through testing or there is a relation with concatinated layers.


head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

Edwardmark added the question Further information is requested label Nov 17, 2020

Edwardmark closed this as completed Nov 23, 2020

Edwardmark reopened this Nov 23, 2020

Edwardmark closed this as completed Nov 30, 2020

glenn-jocher linked a pull request Aug 30, 2021 that will close this issue

Fix: add P2 layer 21 to yolov5-p2.yaml Detect() inputs #4608

Merged

This was referenced Nov 9, 2021

Deeper Exercise: YoloV5 Road Dataset akailany/Pattern-Segmentation-in-Aerial-Agricultural-Images#6

Closed

Deeper Exercise: Not Detecting New Classes akailany/Pattern-Segmentation-in-Aerial-Agricultural-Images#8

Closed

nachoogriis mentioned this issue Apr 8, 2024

How to remove a detection layer? #12894

Closed

1 task

How to add another detection head ? #1418

How to add another detection head ? #1418

Comments

Edwardmark commented Nov 17, 2020

❔Question

Additional context

glenn-jocher commented Nov 17, 2020

JoJoliking commented Nov 19, 2020

Edwardmark commented Nov 20, 2020 • edited Loading

glenn-jocher commented Nov 20, 2020 • edited Loading

glenn-jocher commented Nov 20, 2020 • edited Loading

JoJoliking commented Nov 20, 2020

glenn-jocher commented Nov 20, 2020

glenn-jocher commented Nov 20, 2020

JoJoliking commented Nov 21, 2020

glenn-jocher commented Nov 21, 2020

JoJoliking commented Nov 21, 2020

glenn-jocher commented Nov 21, 2020

Edwardmark commented Nov 23, 2020

mary-0830 commented Nov 26, 2020

glenn-jocher commented Nov 26, 2020

Edwardmark commented Nov 27, 2020 • edited Loading

glenn-jocher commented Nov 27, 2020

JoJoliking commented Nov 27, 2020

glenn-jocher commented Nov 27, 2020

JoJoliking commented Nov 27, 2020

JoJoliking commented Nov 28, 2020

glenn-jocher commented Nov 29, 2020 • edited Loading

JoJoliking commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020 • edited Loading

JoJoliking commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

WANGCHAO1996 commented Jan 27, 2021 • edited Loading

anchors

YOLOv5 backbone

[from, number, module, args]

YOLOv5 head

glenn-jocher commented Jan 27, 2021

YukunXia commented Aug 29, 2021

glenn-jocher commented Aug 30, 2021

YukunXia commented Aug 30, 2021 • edited Loading

JoJoliking commented Aug 30, 2021

xengst commented Nov 20, 2021

glenn-jocher commented Nov 20, 2021

xengst commented Nov 20, 2021 • edited Loading

glenn-jocher commented Nov 20, 2021

xengst commented Nov 20, 2021

glenn-jocher commented Nov 20, 2021

myasser63 commented Dec 26, 2021

glenn-jocher commented Dec 27, 2021 • edited Loading

myasser63 commented Dec 28, 2021

myasser63 commented Jan 1, 2022

Edwardmark commented Nov 20, 2020 •

edited

Loading

glenn-jocher commented Nov 20, 2020 •

edited

Loading

glenn-jocher commented Nov 20, 2020 •

edited

Loading

Edwardmark commented Nov 27, 2020 •

edited

Loading

glenn-jocher commented Nov 29, 2020 •

edited

Loading

glenn-jocher commented Nov 29, 2020 •

edited

Loading

WANGCHAO1996 commented Jan 27, 2021 •

edited

Loading

YukunXia commented Aug 30, 2021 •

edited

Loading

xengst commented Nov 20, 2021 •

edited

Loading

glenn-jocher commented Dec 27, 2021 •

edited

Loading