Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于fuse_ab 显存占用很大的问题 #687

Open
4 tasks done
yang-0201 opened this issue Jan 15, 2023 · 5 comments
Open
4 tasks done

关于fuse_ab 显存占用很大的问题 #687

yang-0201 opened this issue Jan 15, 2023 · 5 comments
Labels
question Further information is requested

Comments

@yang-0201
Copy link

Before Asking

  • I have read the README carefully. 我已经仔细阅读了README上的操作指引。

  • I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。(FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。)

  • I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

Search before asking

  • I have searched the YOLOv6 issues and found no similar questions.

Question

OOM RuntimeError is raised due to the huge memory cost during label assignment
之前使用v0.2.0版本的yolov6小模型,我是3090 24G显存的卡,训练coco数据集,batch-size设为64也基本不会出现,现在小模型用了fuse ab后,batchsize 36 也一直出现这个问题,是不是fuse ab显存消耗太大了,我将batchsize 设为24不会有了,但是训练慢了两倍多,有什么方法弥补呢

Additional

No response

@yang-0201 yang-0201 added the question Further information is requested label Jan 15, 2023
@Chilicyy
Copy link
Collaborator

@yang-0201 您好,我们针对显存问题进行了优化,麻烦拉取最新的代码再试试。

@ysc703
Copy link

ysc703 commented Feb 1, 2023

@Chilicyy 您好,麻烦问一下,针对这个问题,最新的代码哪里有修改?我的现象和这个一样(用的是最新的代码),基本上batch-size也要从64减少24。
issue #656中提到 self.accumulate = max(1, round(64 / self.batch_size)) ,最新代码已经是这样了。

非常感谢!

@Chilicyy
Copy link
Collaborator

Chilicyy commented Feb 1, 2023

@ysc703 这个PR已修复显存大的问题,这边可以确认下这块代码是否是最新的

@ysc703
Copy link

ysc703 commented Feb 1, 2023

@Chilicyy 谢谢! 刚看了一下,最新代码已经更新成这个PR的修改了。确实有点儿奇怪了

@ysc703
Copy link

ysc703 commented Feb 1, 2023

BTW,我是从头开始训练的,使用的是yolov6n.py。没有采用yolov6n_finetune.py和预训练模型

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants