-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GENet的精度复现问题 #3
Comments
We will update our draft this week to include more detailed training parameters. We use cosine lr decay, warm-up 5 epochs, wd is 4e-5, lr=0.1, batch size 256. |
请问蒸馏对 论文的结果带来了多大的提升? |
The main purpose of teacher network is to help the student network escape the bad local minima. There is about 1% accuracy drop without the help of teacher network in the early training stages. |
所以蒸馏对最终精度没影响?只是收敛的更快了? |
Without teacher network, the training will quickly get stuck around 60 epochs. With teacher network, the accuracy will keep increasing as you train longer. It seems that what teacher network you use is not important, which is wired to us too. |
merge from idstcv to minglin-home
Merge pull request #3 from idstcv/master
看到您的GENet,很感兴趣,想复现一下论文的结果,但是发现论文的训练细节不是特别清楚。 我用batch size 1024,lr 0.5,weight decay 1e-4,epochs 360, 5个epochs的 warmup,cosine 学习率衰减,无dropout, GENet-normal结构的精度只训练到了76.1。
想咨询一下GENet-normal结构的训练策略是怎么样的,比如 lr,batch size,weight decay ,dropout rate,epochs,学习率的衰减策略,以及是否用了warm up。盼望得到您的帮助~
The text was updated successfully, but these errors were encountered: