-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[s2s] distill t5-large -> t5-small #8376
Merged
sshleifer
merged 25 commits into
huggingface:master
from
sbhaktha:add_student_base_model
Nov 11, 2020
+108
−67
Merged
Changes from 15 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
1ff83eb
Implement student model using different base model
sbhaktha 758883a
changes
sbhaktha 0ccf184
Merge branch 'master' of https://github.com/huggingface/transformers …
sbhaktha ddf4d5d
Merging changes from master
sbhaktha ee878fe
Use teacher encoder outputs while calling teacher decoder
sbhaktha 58a9a6e
Merge branch 'master' of https://github.com/huggingface/transformers …
sbhaktha cfbc357
Return 0 tensor when hidden loss is not applicable, rename student_ba…
sbhaktha 73c0a54
Merge branch 'master' of https://github.com/huggingface/transformers …
sbhaktha 8bc75fd
Do not create student model in eval mode
sbhaktha 708e457
Remove debug print
sbhaktha 7bc3baa
Fix bugs causing failed tests
sbhaktha 02fa544
Adding unit tests. Including code refactor per request on PR.
sbhaktha fb7a6f0
Merge branch 'master' of https://github.com/huggingface/transformers …
sbhaktha 9432e67
Formatting changes per make fixup
sbhaktha 65945d7
style
sshleifer 937082e
Selectively unpack teacher encoder output and hidden states
sbhaktha 9f3bc7c
Merge branch 'master' of https://github.com/huggingface/transformers …
sbhaktha 65fc338
Merge branch 'add_student_base_model' of https://github.com/sbhaktha/…
sbhaktha d42ccbe
Merge branch 'master' into add_student_base_model
sshleifer 2157d4d
style
sshleifer fcb2de3
fixup
sshleifer f7495de
fixed return_dict issue
sshleifer f313883
Merge branch 'master' into add_student_base_model
sshleifer 26b3b91
style
sshleifer 9858290
use_cache=False
sshleifer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the old case of distilling to a copied student, this change uses more memory.
Trying to find a clean solution locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Not sure what part of the change makes it use more memory? Is there any way I can help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to only extract hidden states, now we extract hidden states and encoder outputs.
Your application doesn't need hidden states, the other application doesn't need encoder outputs, so splitting the call to
self.teacher.get_encoder()(
to only get what's needed is the next step.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.I am attempting a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not true that the other application doesn't need encoder outputs. When calling the teacher decoder, if student and teacher encoders are different, the right thing to do is to pass the teacher encoder outputs, and as we can see from these lines, even in the original case, we could have
self.different_encoder
be true if not all layers were copied from teacher to student:transformers/examples/seq2seq/distillation.py
Lines 60 to 61 in 02bdfc0
So I think we do need to extract teacher encoder outputs inside this
if different_encoder
block.However, we can extract hidden states optionally based on whether we need to calculate hidden loss or not. I have the following modified code which passes the unit tests. Please lmk what you think.
That said, the same boolean
self.do_calc_hidden_loss
can be used to setoutput_hidden_states
toTrue
orFalse
in the call to the teacher decoder as well . If the above makes sense to you I'll make a similar change here as well and push a commit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thanks for the clarification! I'll take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I have another fix that's passing unit tests. Please lmk what you think. I renamed the
self.do_calc_hidden_loss
toself.different_base_models
because really that's what it was keeping track of, and the boolean value is flipped from the earlier one.self.different_base_models
is initialized in__init__
in the same place as the earlierself.do_calc_hidden_loss
as follows:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes push that, I'll make some changes and get it merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed!