Train models from scratch #60

zhiqwang · 2021-02-12T20:11:08Z

🚀 Feature

Support training models from scratch, this is a follow-up issue of #16.

Motivation

Test whether the trainer mechanism works.

Pitch

The text was updated successfully, but these errors were encountered:

kartik4949 · 2021-02-18T19:34:36Z

@zhiqwang Hi, library seems good, so i was thinking to contribute and make it as flexible as possible by adding support for many other backbones, losses and FPN
and even add own architecture which is tweaked for performance purpose.
let me know if u have a slack channel or other platform to discuss above
Thanks

zhiqwang · 2021-02-19T02:15:39Z

Hi @kartik4949

Some modular design does require more careful consideration, we are eager for your help, and join on Slack here .

stereomatchingkiss · 2021-04-17T13:08:44Z

Could I train the model by yolov5-rt with custom dataset?Or I need to train the model by yolov5 v4.0 then convert the weights by

from yolort.utils import update_module_state_from_ultralytics

# Update module state from ultralytics
model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0', custom_path_or_model = torch.load('path/to/model.pt'), num_classes = 1)
# Save updated module
torch.save(model.state_dict(), 'yolov5s_updated.pt')

Thanks

zhiqwang · 2021-04-18T00:19:50Z

Hi @stereomatchingkiss , Both of these are feasible, but I recommend the second approach now.

Training with yolort is now in the experimental phase, you can check the following for more details.

https://github.com/zhiqwang/yolov5-rt-stack/blob/2125d06f8cf8726401211a152890e46e3b3416e6/test/test_engine.py#L101-L110

zhiqwang · 2021-04-18T04:31:22Z

FYI I aim to release a version that supports training before 7th May, I guess that it will not train as well as ultralytics, but it will be more friendly 😄

Tomakko · 2021-06-04T11:10:14Z

Hi @zhiqwang , thanks for your awesome repo! Do you have any news on the training release? I started from your codebase to implement training myself. It is working fine now, i.e. i can run training steps, however i am running into one issue.

When i apply default_train_transforms in your data modules. It happends that after transforming, there are no targets left, probably because they lie outside of the crop.

Can you give me some hints how to deal best with empty targets in box_head.py? Particularily in those functions:

targets_cls, targets_box, indices, anchors = self.select_training_samples(head_outputs, targets)
losses = self.compute_loss(head_outputs, targets_cls, targets_box, indices, anchors)

Thanks a lot in adavance!

zhiqwang · 2021-06-05T06:27:35Z

Hi @Tomakko

Thanks for your carefully debug information, I guess it is due to the poorly implementation of the data augmentation, as you mentioned, the default_train_transforms in

https://github.com/zhiqwang/yolov5-rt-stack/blob/b0af4a1b17805543f415df705deb66f398b10170/yolort/data/data_module.py#L81-L92

will filter most targets.

I think we should fix this augmentation to make sure there are at least one targets left when the losses are computed in

https://github.com/zhiqwang/yolov5-rt-stack/blob/b0af4a1b17805543f415df705deb66f398b10170/yolort/models/box_head.py#L129-L130

Do you have any news on the training release?

My next plan is to learn from the realization of data augmentation in torchvision, they recently upload the augmentation methods when they are training the SSD models, we can borrow some of their codes here to make the augmentation acceptable.

Your feedback is very important to me, and feel free to file new issues about the trainer here, and let's train a good model together. 🚀

Tomakko · 2021-06-11T13:48:26Z

Thanks you @zhiqwang! I currently need to relalize an embedded yolo model in the short term and therefore do training with ultralytics, but afterwards i would be willing to contribute here. The training pipeline in ultralytics is just super cumbersome ;)

denguir · 2022-06-16T10:35:27Z

Hi @zhiqwang, thanks for the awesome work !
I was wondering how to load a pretrained model if the number of classes differs from the default, something like that:

from yolort.models import yolov5s
model = yolov5s(pretrained=True, score_thresh=0.45, num_classes=5)

This piece of code throws the following error due to dimension mismatch:

RuntimeError: Error(s) in loading state_dict for YOLO:
	size mismatch for head.head.0.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 128, 1, 1]).
	size mismatch for head.head.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).
	size mismatch for head.head.1.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 256, 1, 1]).
	size mismatch for head.head.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).
	size mismatch for head.head.2.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([30, 512, 1, 1]).
	size mismatch for head.head.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([30]).

As we can see, only the weights & biases of head.head are mismatching, and I think that the formula to get that first dimension is (num_classes + 5) * 3.

Is there any function/method that I'm not aware of that would allow us to match these dimensions, some method that would work like that (if integrated in the YOLO class):

def load_state_dict(self, state_dict, num_classes):
        weights_to_skip = [f"head.head.{i}.weight" for i in range(3)]
        bias_to_skip = [f"head.head.{i}.bias" for i in range(3)]
        for weight in weights_to_skip + bias_to_skip:
            state_dict[weight] = state_dict[weight][:(num_classes + 5) * 3, ...]
        super().load_state_dict(state_dict)

Currently the only way I found to load a YOLO model that has a different number of classes is to use the load_from_yolov5 method which requires us to already have a checkpoint model.

zhiqwang · 2022-06-16T11:35:22Z

Hi @denguir , Thanks for asking this questions first.

Is there any function/method that I'm not aware of that would allow us to match these dimensions.

We don't currently offer a solution to deal with this problem. But I guess you can load only the backbone parts to partially solve the problem. (I modified the snippets from https://discuss.pytorch.org/t/how-to-load-part-of-pre-trained-model/1113/3)

from yolort.models import yolov5s
from yolort.utils import load_state_dict_from_url

model = yolov5s(pretrained=False, score_thresh=0.45, num_classes=5)

checkpoint_path = "/home/user/.cache/torch/hub/checkpoints/yolov5_darknet_pan_s_r60_coco-9f44bf3f.pt"
pretrained_dict = load_state_dict_from_url(checkpoint_path)

# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if "backbone" in k}
# 2. load the filted state dict
model.model.load_state_dict(pretrained_dict, strict=False)

BTW, The training mechanism of yolort is still not well developed and any kind of contribution is welcome here.

denguir · 2022-06-16T19:40:07Z

Thanks @zhiqwang, I will definitely explore further the training process of yolort and I will try to help there

zhiqwang added enhancement New feature or request help wanted Extra attention is needed API Library use interface labels Feb 12, 2021

zhiqwang mentioned this issue Feb 22, 2021

Initialize bias into YoloHead #67

Merged

zhiqwang self-assigned this Jun 11, 2021

This was referenced Jun 11, 2021

Fixing data augmentation #115

Merged

Add anchor / target assignment visualization notebook #116

Merged

This was referenced Jun 25, 2021

Fixing normalization in dataloader #124

Merged

Refactorring skeleton of SetCriterion #125

Merged

Fixing trainer with lightning #128

Merged

zhiqwang mentioned this issue Jul 10, 2021

torchscripted model for training phase #131

Open

zhiqwang mentioned this issue Jul 19, 2021

Fixing build_targets in SetCriterion #143

Merged

zhiqwang mentioned this issue Sep 10, 2021

Upgrade PT to 1.9.0 in unit-test #161

Merged

zhiqwang mentioned this issue Mar 11, 2022

Copy the dataloader and trainer from yolov5 #350

Merged

zhiqwang linked a pull request Jun 5, 2022 that will close this issue

Add trainer and dataloader from yolov5 #408

Open

zhiqwang mentioned this issue Dec 28, 2022

Loading pre-trained model is not supported for num_classes != 80 #475

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train models from scratch #60

Train models from scratch #60

zhiqwang commented Feb 12, 2021 •

edited

Loading

kartik4949 commented Feb 18, 2021

zhiqwang commented Feb 19, 2021 •

edited

Loading

stereomatchingkiss commented Apr 17, 2021

zhiqwang commented Apr 18, 2021 •

edited

Loading

zhiqwang commented Apr 18, 2021

Tomakko commented Jun 4, 2021

zhiqwang commented Jun 5, 2021 •

edited

Loading

Tomakko commented Jun 11, 2021

denguir commented Jun 16, 2022

zhiqwang commented Jun 16, 2022 •

edited

Loading

denguir commented Jun 16, 2022

Train models from scratch #60

Train models from scratch #60

Comments

zhiqwang commented Feb 12, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

kartik4949 commented Feb 18, 2021

zhiqwang commented Feb 19, 2021 • edited Loading

stereomatchingkiss commented Apr 17, 2021

zhiqwang commented Apr 18, 2021 • edited Loading

zhiqwang commented Apr 18, 2021

Tomakko commented Jun 4, 2021

zhiqwang commented Jun 5, 2021 • edited Loading

Tomakko commented Jun 11, 2021

denguir commented Jun 16, 2022

zhiqwang commented Jun 16, 2022 • edited Loading

denguir commented Jun 16, 2022

zhiqwang commented Feb 12, 2021 •

edited

Loading

zhiqwang commented Feb 19, 2021 •

edited

Loading

zhiqwang commented Apr 18, 2021 •

edited

Loading

zhiqwang commented Jun 5, 2021 •

edited

Loading

zhiqwang commented Jun 16, 2022 •

edited

Loading