Skip to content

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

@sheetalsh456

Description

@sheetalsh456

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

  1. I downloaded the pre-trained weights 'biobert_pubmed_pmc.tar.gz' from the Releases page.

  2. I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file 'biobert.model' in the specified path.

  1. As mentioned in this link , I compressed 'biobert.model' created above and 'biobert/pubmed_pmc_470k/bert_config.json' together into a biobert_model.tar.gz

  2. I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte' 

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    DiscussionDiscussion on a topic (keep it focused or open a new issue though)wontfix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions