-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.
-
I downloaded the pre-trained weights 'biobert_pubmed_pmc.tar.gz' from the Releases page.
-
I ran this command to convert the tf checkpoint to pytorch model
python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"
This created a file 'biobert.model' in the specified path.
-
As mentioned in this link , I compressed 'biobert.model' created above and 'biobert/pubmed_pmc_470k/bert_config.json' together into a biobert_model.tar.gz
-
I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.
python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case
I get the error
'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte'
in the line
tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
Am I doing something wrong?
I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?