Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial pipeline #1

Merged
merged 72 commits into from
Jun 17, 2021
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c6a96dc
Copy students scripts. Fix install, download and clean steps
eu9ene May 7, 2021
d8fce65
Teacher scripts
eu9ene May 8, 2021
7b75925
Fix install scripts
eu9ene May 12, 2021
6403742
Fix install reproducibility
eu9ene May 12, 2021
81f41ab
Datasets downloading automation
eu9ene May 13, 2021
9e1e32b
Cleaning fixes
eu9ene May 13, 2021
0a08add
Fix run
eu9ene May 13, 2021
83a3f07
Fix cleaning
eu9ene May 13, 2021
bc996e8
Fix python path
eu9ene May 13, 2021
047f602
Fix clean scripts
eu9ene May 14, 2021
6684801
Fix naming of clean scripts
eu9ene May 14, 2021
c05b0ab
Refactor folders structure
eu9ene May 17, 2021
dc3a809
Remove unused submodules
eu9ene May 17, 2021
fdefe78
Move submodules to a separate dir
eu9ene May 17, 2021
52b7ab9
Refactor model training scripts
eu9ene May 18, 2021
96b6f83
Fix relative paths
eu9ene May 18, 2021
760b8e2
Fix paths in install scripts
eu9ene May 21, 2021
8381fb8
Downloading and training fixes
eu9ene May 21, 2021
625ba9d
Add mono downloading, fix training paths
eu9ene May 21, 2021
4c273fa
Add back translations
eu9ene May 26, 2021
c61f8f7
Fix translations scripts
eu9ene May 27, 2021
8c59a0d
Refactorings
eu9ene May 28, 2021
9895024
Add snakepit runner
eu9ene May 28, 2021
e8ed869
Fix translation
eu9ene Jun 1, 2021
5193478
Add ce filter
eu9ene Jun 1, 2021
8dae039
Ce-filter fixes
eu9ene Jun 2, 2021
492d7c0
Use pigz
eu9ene Jun 2, 2021
218ded5
Add more docs
eu9ene Jun 2, 2021
e5cd54d
Fix alignment
eu9ene Jun 2, 2021
63b5a88
Fix alignment
eu9ene Jun 2, 2021
ea3291e
Fix ce-filter
eu9ene Jun 2, 2021
9823dcd
Use more memory for sorting
eu9ene Jun 2, 2021
8eefb27
Use local tmp dir for alignments
eu9ene Jun 2, 2021
9fc031d
Fix alignment scripts
eu9ene Jun 3, 2021
d213e54
Fix shortlist scripts
eu9ene Jun 4, 2021
9a5c252
Copy teacher vocab
eu9ene Jun 4, 2021
a1e5595
Fix tensorboard
eu9ene Jun 4, 2021
4f0b1aa
Add quantization
eu9ene Jun 4, 2021
1175d2c
Refactorings
eu9ene Jun 4, 2021
fba3257
Move finetuning to a separate folder
eu9ene Jun 7, 2021
6176e25
Refactor training scripts
eu9ene Jun 7, 2021
1a2c910
Fix quantization
eu9ene Jun 7, 2021
926e59e
Fix evaluation of quantized model
eu9ene Jun 7, 2021
d30b97c
Add export
eu9ene Jun 8, 2021
c4b0cc1
Fix export
eu9ene Jun 8, 2021
7ffed00
Fix evaluation
eu9ene Jun 8, 2021
c2fa1e8
Format code
eu9ene Jun 9, 2021
77b2c07
Pin python packages
eu9ene Jun 9, 2021
3a4902b
Add more logging
eu9ene Jun 9, 2021
125f389
Refactor directories
eu9ene Jun 9, 2021
eb8a37e
Fix config
eu9ene Jun 10, 2021
72ba220
Add more docs
eu9ene Jun 10, 2021
82f057f
Add datasets docs
eu9ene Jun 10, 2021
ed49f3c
Fix corpus downloading edge case
eu9ene Jun 10, 2021
e80b5e0
Change paracrawl prefix
eu9ene Jun 10, 2021
e464620
Fix corpus downloading
eu9ene Jun 10, 2021
4207b6c
Fix the main script
eu9ene Jun 10, 2021
2d29cd3
Fix GPUS arg
eu9ene Jun 10, 2021
741a476
Add more docs
eu9ene Jun 10, 2021
9ce07ce
Fix ce filtering
eu9ene Jun 11, 2021
c06c023
Add condition to evaluation
eu9ene Jun 11, 2021
342c6f8
Use augmented dataset for the teacher
eu9ene Jun 11, 2021
7ee1d2a
Fix corpus translation
eu9ene Jun 11, 2021
232de97
Add mode documentation
eu9ene Jun 11, 2021
83e0274
Add architecture section
eu9ene Jun 11, 2021
717d1f2
Ignore pipefail for mono corpus shuffling
eu9ene Jun 12, 2021
e1d27a6
Unquote positional arguments
eu9ene Jun 15, 2021
bb32b18
Add ability to skip corpus augmentation with back-translations
eu9ene Jun 15, 2021
4203e87
Add instructions how to run tensorboard
eu9ene Jun 15, 2021
9efb3a5
Add more checks to cleaning scripts
eu9ene Jun 15, 2021
db997ee
Extract variable for teacher corpus
eu9ene Jun 15, 2021
68ba306
Add experiment name to directories structure
eu9ene Jun 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix run
  • Loading branch information
eu9ene committed May 13, 2021
commit 0a08add339317531810d60c6ac0a25543e5b02f9
4 changes: 2 additions & 2 deletions run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ conda activate bergamot-training-env
original=${DATA_DIR}/original
. ./pipeline/1-data/download-corpus.sh ${original}/corpus $TRAIN_DATASETS
. ./pipeline/1-data/download-corpus.sh ${original}/devset $DEVTEST_DATASETS
if [! -z "${MONO_DATASETS}" ]; then
if [[ ${MONO_DATASETS} ]]; then
. ./pipeline/1-data/download-mono.sh ${SRC} $MONO_MAX_SENTENCES ${original}/mono $MONO_DATASETS
fi

clean=${DATA_DIR}/clean
. ./pipeline/2-clean/clean-corpus.sh ${original}/corpus ${clean}/corpus
if [-e ${DATA_DIR}/original/mono.en.gz ]; then
if [[ -e ${DATA_DIR}/original/mono.en.gz ]]; then
. ./pipeline/2-clean/clean-mono.sh en ${original}/mono ${clean}/mono
fi

Expand Down