Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add another detection head ? #1418

Closed
Edwardmark opened this issue Nov 17, 2020 · 42 comments · Fixed by #4608
Closed

How to add another detection head ? #1418

Edwardmark opened this issue Nov 17, 2020 · 42 comments · Fixed by #4608
Labels
question Further information is requested

Comments

@Edwardmark
Copy link

❔Question

In yolo5s.yaml, there is only three detection layer [P3/8, P4/16, P5/32], how to add another layer with scale 64 to detect really big objects?

Additional context

Could you please kindly give me some guide? Thanks. @glenn-jocher

@Edwardmark Edwardmark added the question Further information is requested label Nov 17, 2020
@glenn-jocher
Copy link
Member

@Edwardmark design modifications are up to you. Start from the existing yamls and modify as you see fit.
https://github.com/ultralytics/yolov5/tree/master/models

@JoJoliking
Copy link

@Edwardmark Have you solve the problem ? I also want to add and modify the detection head. And now I can't find the location of the detection head 's code

@Edwardmark
Copy link
Author

Edwardmark commented Nov 20, 2020

@glenn-jocher Could you please explain the parameters in yolov5l.yaml a little, let's say, if we want to add a head which aims to detect large objects (e.g. 640x640 big objects), what should be added to anchor and backbone and head in yolov5l.yaml?
Thanks. The model definition is hard to understand for me, please help me out, thanks in advance.

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 20, 2020

@Edwardmark @JoJoliking sure no problem. The current models output P3-P5 layers supporting strides 8-32. You want to export a P6 layer with stride 64.

You can export from any layer of the model you want simply by adding it to the input list of Detect(). This is one of the major advancements we made in YOLOv5 above and beyond the previous cfg architectures:

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

So all you need to do is build the additional structure you want and then add the output layer you want to this list. You could then add another set of P6/64 anchors manually to the model, or you could simply delete the manual anchors and put a number instead, like anchors: 3 to tell the model to compute 3 of it's own anchors at each output.

anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32

To build the additional structure, you can simply repeat the steps from P4 to P5:

[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

In terms of P6, there's no 64-stride layers earlier to concat, so you could simply do something like this for the easiest P6/64 output. If you wanted to get fancier you could have the backbone travel down to P6/64, and then concat that layer with the head (same as P5 is handled).

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [-1, 1, Conv, [1024, 3, 2]],
   [-1, 3, BottleneckCSP, [2048, False]],  # 25 (P6/64-xlarge)

   [[17, 20, 23, 25], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 20, 2020

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

@JoJoliking
Copy link

@glenn-jocher Excellent answer.
For different yaml. 5s, 5l and 5x all use YOLOv5 head. For 5-fpn and 5-panet, it corresponds to fpn head and panet head.
These three head structures should be different. There is no definition of fpn and panet under common.py. At the same time, it seems that fpn and panet are not generated in YOLOv5head (I don’t know if I missed it)
right?

@glenn-jocher
Copy link
Member

@JoJoliking the four YOLOv5 models s/m/l/x are all built from yolov5-panet.yaml with different compound scaling constants. I experimented to find the best constants ratio, starting from the EfficientDet scaling equations, and these are used now for the four sizes.

FPN heads (like in YOLOv3) perform worse and are no longer used, though yolov5-fpn.yaml is archived for historical reasons (and to show how to modify the head structure from FPN to PANet).

@glenn-jocher
Copy link
Member

Also common.py and experimental.py define low level modules that are used to create FPN or PANet heads. The heads themselves are only created and defined in the yamls.

@JoJoliking
Copy link

@glenn-jocher All right。 I already know the relationship between the network structure。
By the way,If I want to increase the output of a network dimension, such as 4 offsets of the bounding box, so which places should I modify? How should I add a convolutional layer branch to each of the three detection heads to achieve this? I tried to modify the yaml and Detect functions, but failed. Forgive me for not having a deep understanding of the code. Sorry.

@glenn-jocher
Copy link
Member

@JoJoliking I don't understand what you are asking.

@JoJoliking
Copy link

@glenn-jocher
Sorry. I think I should describe my problem more clearly. The current detection output is 85 containing the prediction probability of the 80 category, the xywh of the bounding box and the classification score, right? My idea : Do not change the existing output, while adding a four-dimensional output. They are the corresponding offset of xywh respectively. (Use fully connected layer or convolutional layer to achieve)

@glenn-jocher
Copy link
Member

The three n-to-255 convolutions are contained inside the Detect() layer, you can apply any modifications you want there.

Though applying offsets/gains to the existing offsets and gains may overdetermine some of the parameters. ie fitting two offsets for one value is not typical in parameter estimation as there is only 1 degree of freedom there.

@Edwardmark
Copy link
Author

@glenn-jocher Thanks for your kind reply. It helps me a lot.
Best,
Edward.

@mary-0830
Copy link

Hello, I would like to ask me to add some anchor box parameters after the anchor attribute in yolov5s.yaml, but an overflow error will be displayed.

Excuse me, is this not allowed? Or is there anything I haven’t changed?

The parameters I added are like this.
`anchors:

  • [93, 72, 116,90, 125, 158, 156,198, 298, 261, 373,326, 448, 391] # P5/32
  • [24, 49, 30,61, 50, 36, 62,45, 47, 95, 59,119, 71, 143] # P4/16
  • [8, 10, 10,13, 13, 24, 16,30, 26, 18, 33,23, 40, 28] # P3/8`

@glenn-jocher
Copy link
Member

@mary-0830 you're free to modify anchors as you see fit. The only constraint is each output layer requires the same number of anchors.

If autoanchor doesn't like your new anchors, it will create new ones on it's own, based on the number you supplied initially. You can disable autoanchor with python train.py --noautoanchor.

You can also simply specify a number here instead of anchor vectors:
anchors: 3

@Edwardmark
Copy link
Author

Edwardmark commented Nov 27, 2020

@glenn-jocher If I add head, what shold I modified in the compute_loss funcition? How to set balance? in compute_loss function?
image
Thanks.

@glenn-jocher
Copy link
Member

@Edwardmark modifications are up to you.

@JoJoliking
Copy link

@glenn-jocher Hallow, If I want to load multiple data sets for training at the same time (they will be placed in the same subfolder), then how should I modify the LoadImagesAndLabels function?

@glenn-jocher
Copy link
Member

@JoJoliking coco128.yaml already explains how to load multiple datasets. Do not modify the code.

yolov5/data/coco128.yaml

Lines 12 to 15 in 97a5227

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ../coco128/images/train2017/ # 128 images
val: ../coco128/images/train2017/ # 128 images

@JoJoliking
Copy link

OK .I will have a try. Thank you for your previous reply to my question. I have a success.

@JoJoliking
Copy link

@glenn-jocher
Hallow. Dear YOLO5 Author! If I only want YOLOv5 to recognize human, how should anchor size and ancho_t(default=4.0) be modified? Can you give me some advice ?

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 29, 2020

@JoJoliking I would recommend training with all default settings (no modification). To start see:
https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

@JoJoliking
Copy link

@glenn-jocher OK. Thanks .I will try my ideas.

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 29, 2020

@JoJoliking ok! Also remember COCO models already offer human detection. You can also filter detections by class to only show human detections like this, so in reality I would not even train a new model if all you want is human detection:

python detect.py --classes 0

@JoJoliking
Copy link

@glenn-jocher
Yes. Dear glenn-jocher
In fact, I will use other human datasets for training. This type human datasets only have one class (zeros for human). At the same time, I notice tha the cls loss is always zeors. I think this is normal, because network only need to distinguish background and person. Right ?

@glenn-jocher
Copy link
Member

@JoJoliking yes, this is normal. Single-class datasets do not have any classification loss as there is no classification task, only objectness loss.

@WANGCHAO1996
Copy link

WANGCHAO1996 commented Jan 27, 2021

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

Hello author, I want to add a detection layer for detecting small targets. Now the latest code has been modified, how should I modify it@glenn-jocher

anchors

anchors:

  • [10,13, 16,30, 33,23] # P3/8
  • [30,61, 62,45, 59,119] # P4/16
  • [116,90, 156,198, 373,326] # P5/32

YOLOv5 backbone

backbone:

[from, number, module, args]

[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]], # 9
]

YOLOv5 head

head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, BottleneckCSP, [512, False]], # 13

[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)

[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)

[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

@glenn-jocher
Copy link
Member

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4):
https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

@YukunXia
Copy link
Contributor

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4):
https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

@glenn-jocher

Should this line

[[24, 27, 30], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

add 21 to the anchor list?

Besides, maybe the anchors should add one more line of size definition?

@glenn-jocher
Copy link
Member

@YukunXia actually yes, maybe the Detect input list should probably include 21 as well. You can train either way, with the current setup using the same output strides, but with the PANet head dipping down into P2 stride convolutions to help add accuracy to the P3 output.

It's been a while since I made this so I can't remember if the 21 omission is intentional or not.

Can you submit a PR with the 21 addition to this yaml? Thanks!

@YukunXia
Copy link
Contributor

YukunXia commented Aug 30, 2021

OK, the PR is submitted.

1db6554

@JoJoliking
Copy link

@glenn-jocher Dear Author. By the way. Have you tried to use BiFPN in Yolov5 instead of PANet ? My experiments show that BiFPN with stacking 3 Layers can reach a better map on other datasets!

@xengst
Copy link

xengst commented Nov 20, 2021

@glenn-jocher If I want to add another model e.g VGG16 to the backbone, what is the right way to do that?

@glenn-jocher
Copy link
Member

@xengst you could try to create backbone modifications in a yaml file, though be aware that the head and backbone are connected at many different places by shortcut connections and not just at the end of the backbone.

@xengst
Copy link

xengst commented Nov 20, 2021

@glenn-jocher Thank your for your reply.

So I should define all VGG16 layer as class first in common.py then add it to model.yaml?

@glenn-jocher
Copy link
Member

@xengst yes. Remember the head needs skip connections from P3, P4, P5 (layers 6, 4 and 10 here):

# YOLOv5 v6.0 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

@xengst
Copy link

xengst commented Nov 20, 2021

@glenn-jocher So what i am trying to do it wouldn't work ? if I want to test different head and backbone is not possible?

@glenn-jocher
Copy link
Member

@xengst how would I know if your experiment will 'work' or not?

@myasser63
Copy link

@glenn-jocher Can I directly add a layer to Detect from the backbone?

@glenn-jocher
Copy link
Member

glenn-jocher commented Dec 27, 2021

@myasser63 yes Detect can accept inputs from any part of the model. If you update Detect inputs you probably also want to set anchors: 3. This tells AutoAnchor to evolve 3 anchors for each Detect input.

@myasser63
Copy link

Thanks @glenn-jocher four your explanation

@myasser63
Copy link

@glenn-jocher I want to understant the concept behind choosing ch_out for head Conv layer. Is it through testing or there is a relation with concatinated layers.


head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants