Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法复现结果 #21

Open
GeraintLi opened this issue Jan 15, 2024 · 6 comments
Open

无法复现结果 #21

GeraintLi opened this issue Jan 15, 2024 · 6 comments

Comments

@GeraintLi
Copy link

作者你好,我使用你提供的代码进行训练,只不过受限于设备,我只能使用两张3090,batch_size_per_gpu设为4,结果训练后的效果很差。考虑到我的batchsize比原来设置的小,我将其跑了60万个iteration,但是效果依旧很差。我想问一下是因为我的batchsize太小才导致训练效果差的吗?

@kkkls
Copy link
Owner

kkkls commented Jan 15, 2024 via email

@GeraintLi
Copy link
Author

请问一下,我可以通过梯度累加的技巧提高batchsize,从而不用调整学习率吗?

@kkkls
Copy link
Owner

kkkls commented Jan 16, 2024

是可以这样做的,但考虑到你只能使用两张3090的情况,我还是建议将学习率调小一点

@GeraintLi
Copy link
Author

好的,谢谢您的建议,我尝试一下。

@hfw6310
Copy link

hfw6310 commented Jan 21, 2024

作者您好 请问一下 我只有一张3090是不是这个work基本上没办法做了?感觉transformer好费卡

@kkkls
Copy link
Owner

kkkls commented Jan 21, 2024

作者您好 请问一下 我只有一张3090是不是这个work基本上没办法做了?感觉transformer好费卡

你好,目前如果想跑到sota的话还是需要较大的batchsize去训练的,资源有限的话可以做一些轻量化的工作

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants