Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ineffective no_decay bug when using BERTAdam #32

Merged
merged 1 commit into from
Nov 20, 2018
Merged

Fix ineffective no_decay bug when using BERTAdam #32

merged 1 commit into from
Nov 20, 2018

Conversation

xiaoda99
Copy link
Contributor

With the original code, all parameters are decayed because the condition "parameter_name in no_decay" will never be satisfied.

@xiaoda99 xiaoda99 changed the title Fix ineffective no_decay bug when using BertAdam Fix ineffective no_decay bug when using BERTAdam Nov 18, 2018
@thomwolf thomwolf merged commit 061eeca into huggingface:master Nov 20, 2018
@thomwolf
Copy link
Member

thanks!

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019
Fix ineffective no_decay bug when using BERTAdam
@xelibrion
Copy link

Question - wouldn't .named_parameters() for the model return a tuple (name, param_tensor), where name looks similar to these

['bert.embeddings.word_embeddings.weight',
 'bert.embeddings.position_embeddings.weight',
 'bert.embeddings.token_type_embeddings.weight',
 'bert.embeddings.LayerNorm.weight',
 'bert.embeddings.LayerNorm.bias',
 'bert.encoder.layer.0.attention.self.query.weight',
 'bert.encoder.layer.0.attention.self.query.bias',
 'bert.encoder.layer.0.attention.self.key.weight',
 'bert.encoder.layer.0.attention.self.key.bias',
 'bert.encoder.layer.0.attention.self.value.weight',
 'bert.encoder.layer.0.attention.self.value.bias',
 'bert.encoder.layer.0.attention.output.dense.weight',
 'bert.encoder.layer.0.attention.output.dense.bias',
 'bert.encoder.layer.0.attention.output.LayerNorm.weight',
 'bert.encoder.layer.0.attention.output.LayerNorm.bias',
...
...
'classifier.linear.weight',
'classifier.linear.bias']

therefore requiring slightly smarter conditions than just in? Something along the lines?

[p for n, p in param_optimizer if any(True for x in no_decay if n.endswith(x))]

@@ -503,8 +503,8 @@ def main():
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if n not in no_decay], 'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if n in no_decay], 'weight_decay_rate': 0.0}
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay_rate': 0.01},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all(nd not in n for nd in no_decay) would be clearer

@xelibrion
Copy link

Don't mind my comment, tested it further this morning and everything seems to work as expected!

stevezheng23 added a commit to stevezheng23/transformers that referenced this pull request Mar 24, 2020
add coqa basic-runner as default coqa at-runner
SaulLu pushed a commit to SaulLu/transformers that referenced this pull request Jan 11, 2022
xloem pushed a commit to xloem/transformers that referenced this pull request Apr 9, 2023
* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (huggingface#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (huggingface#41)

Removed double quantization of output of context layer. (huggingface#45)

Fix DataParallel validation forward signatures (huggingface#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (huggingface#46)

fix sclaer check for non fp16 mode in trainer (huggingface#38)

Mobilebert QAT (huggingface#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54)

add flag to signal NM integration is active (huggingface#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants