- reference code for huggingface's transformers
- simple examples
- NER task
- GLUE task
- fintuning or training RoBERTa from scratch.
- training DistilBert
* python >= 3.6
$ pip install -r requirements
$ python example1.py
$ python example2.py
$ python example3.py
- train and evaluate
$ cp -rf ../transformers/examples/token-classification .
* roberta-base
$ ./train-ner.sh -v -v
...
10/29/2020 20:57:47 - INFO - __main__ - ***** Eval results *****
10/29/2020 20:57:47 - INFO - __main__ - eval_loss = 0.044062983244657516
10/29/2020 20:57:47 - INFO - __main__ - eval_accuracy_score = 0.9920563840971924
10/29/2020 20:57:47 - INFO - __main__ - eval_precision = 0.9507676902536716
10/29/2020 20:57:47 - INFO - __main__ - eval_recall = 0.9587680915516661
10/29/2020 20:57:47 - INFO - __main__ - eval_f1 = 0.9547511312217194
10/29/2020 20:57:47 - INFO - __main__ - epoch = 8.0
10/29/2020 20:57:47 - INFO - __main__ - total_flos = 15050482511624640
...
$ ./eval-ner.sh -v -v
...
10/29/2020 21:05:16 - INFO - __main__ - eval_loss = 0.16634580492973328
10/29/2020 21:05:16 - INFO - __main__ - eval_accuracy_score = 0.9797351135996555
10/29/2020 21:05:16 - INFO - __main__ - eval_precision = 0.8886010362694301
10/29/2020 21:05:16 - INFO - __main__ - eval_recall = 0.9109419263456091
10/29/2020 21:05:16 - INFO - __main__ - eval_f1 = 0.8996328029375766
- tensorboardX
$ tensorboard --logdir engeval-model/runs/ --port port-number --bind_all
- run
$ cp -rf ../transformers/examples/text-classification .
$ ./run-glue.sh -v -v
...
10/29/2020 14:10:47 - INFO - __main__ - ***** Eval results sst2 *****
10/29/2020 14:10:47 - INFO - __main__ - eval_loss = 0.2560681700706482
10/29/2020 14:10:47 - INFO - __main__ - eval_accuracy = 0.9243119266055045
10/29/2020 14:10:47 - INFO - __main__ - epoch = 3.0
10/29/2020 14:10:47 - INFO - __main__ - total_flos = 16988827310258688
-
download raw corpus
-
train
$ cp -rf ../transformers/examples/language-modeling .
$ ./finetune-roberta.sh -v -v
* trouble shooting
...
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_roberta.py", line 98, in forward
position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx).to(input_ids.device)
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_roberta.py", line 1333, in create_position_ids_from_input_ids
mask = input_ids.ne(padding_idx).int()
...
ne() received an invalid combination of arguments - got (NoneType)
modify 'pad_token_id: 1' in config-roberta-base/config.json
- train
* prepare data
* split data if necessary
* $ python split.py --data_path=korean/all.txt --base_path=korean/data.txt --ratio=1000
$ cp -rf ../transformers/examples/language-modeling .
* edit vocab_size in config-roberta-base/config.json
$ ./train-roberta.sh -v -v
- train
$ cp -rf ../transformers/examples/distillation .
$ cp distillation/training_configs/distilbert-base-cased.json distilbert-base.json
* place teacher model to current dir, ex) `pytorch.all.bpe.4.8m_step`
* modify distilbert-base.json, train-distilbert.sh : `vocab_size`
{
"activation": "gelu",
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"n_heads": 12,
"n_layers": 6,
"sinusoidal_pos_embds": true,
"tie_weights_": true,
"vocab_size": 202592
}
* modify distillation/train.py : `max_model_input_size`
args.max_model_input_size = 512
$ ./train-distilbert.sh -v -v
...
06/17/2020 21:37:02 - INFO - transformers.configuration_utils - PID: 2470 - Configuration saved in korean/kor-distil-bpe-bert/config.json
06/17/2020 21:37:04 - INFO - utils - PID: 2470 - Training is finished
06/17/2020 21:37:04 - INFO - utils - PID: 2470 - Let's go get some drinks.
* training parameters
$ cat korean/kor-distil-bpe-bert/parameters.json
...
"n_epoch": 3,
"batch_size": 5,
"group_by_size": true,
"gradient_accumulation_steps": 50,
...
* tensorboardX
$ tensorboard --logdir korean/kor-distil-bpe-bert/log/train --port port-number --bind_all
* make model archive, ex) kor-distil-bpe-bert.v1
$ cp -rf distilbert-base-uncased kor-distil-bpe-bert.v1
$ cp -rf korean/kor-distil-bpe-bert/config.json kor-distil-bpe-bert.v1
** add kor-distil-bpe-bert.v1/config.json
"architectures": [
"DistilBertModel"
],
** copy vocab
$ cp pytorch.all.bep.4.8m_step/vocab.txt kor-distil-bpe-bert.v1
** copy model
$ cp korean/kor-distil-bpe-bert/pytorch_model.bin kor-distil-bpe-bert.v1/
- distilling from BERT large?
- 'attention heads', 'hidden size', 'FFN inner hidden size' are different.
- therefore, we should train a modified BERT large with same 'attention heads', 'hidden size', 'FFN inner hidden size' from scratch.
- and then, distil to distilbert.