-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ineffective no_decay bug when using BERTAdam #32
Conversation
thanks! |
Fix ineffective no_decay bug when using BERTAdam
Question - wouldn't
therefore requiring slightly smarter conditions than just
|
@@ -503,8 +503,8 @@ def main(): | |||
param_optimizer = list(model.named_parameters()) | |||
no_decay = ['bias', 'gamma', 'beta'] | |||
optimizer_grouped_parameters = [ | |||
{'params': [p for n, p in param_optimizer if n not in no_decay], 'weight_decay_rate': 0.01}, | |||
{'params': [p for n, p in param_optimizer if n in no_decay], 'weight_decay_rate': 0.0} | |||
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay_rate': 0.01}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all(nd not in n for nd in no_decay)
would be clearer
Don't mind my comment, tested it further this morning and everything seems to work as expected! |
add coqa basic-runner as default coqa at-runner
…ave_pretrained fix save_pretrained test
* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
With the original code, all parameters are decayed because the condition "parameter_name in no_decay" will never be satisfied.