-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning bug fix #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand this wouldn't affect the behaviour when training crashed and was resumed, correct? That would still continue with loading the optimiser parameters. (TBH I haven't tested if this case even works).
Snakefile
Outdated
@@ -91,14 +92,16 @@ align_dir = f"{data_dir}/alignment" | |||
|
|||
# models | |||
models_dir = f"{data_root_dir}/models/{src}-{trg}/{experiment}" | |||
teacher_dir = f"{models_dir}/teacher" | |||
teacher_all_dir = f"{models_dir}/teacher-all" | |||
teacher_parallel_dir = f"{models_dir}/teacher-parallel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading the source, I don't understand what teacher_parallel_dir
should contain. What is a parallel teacher model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teacher all - the model is trained on all available data.
Teacher parallel - optional model to fine-tune on parallel data only if the data was augmented with back translations.
Will it be easier to understand if I rename them to teacher
and teacher-finetuned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be easier to understand, but this is a very minor point.
This will work because I removed protection from the output file However, this is an irregular situation and not desirable. The pipeline is designed to work end to end without interruptions. |
It causes fewer problems when teacher training on a parallel dataset happens in a separate directory. Model weights are initialized using
--pretrained-model
Marian parameter.fixes Teacher does not continue training if training on augmented data was early stopped #49
Fixes a new bug with student fine-tuning, there was no weights initialization (it was lost during refactoring).
Fixed usage of a pretrained backward model + vocab
fixes Wrong tcol when cleaning with Bicleaner #56