-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not find vocabulary file for Chinese model #34
Comments
need to specify the path of vocab.txt for: |
@zlinao ,i try to load the vocab using the following code: however,get errors do you have the same problem? |
Hi, |
Yes, it is easier to use shortcut name. Thanks for your great work. |
you can change you encoding to 'utf-8' when you load the vocab.txt |
update adversarial training for roberta question anwsering
Creating the custom kernel on the fly.
* Replace matmul with einsum * Fix assertion
After I convert the TF model to pytorch model, I run a classification task on a new Chinese dataset, but get this:
CUDA_VISIBLE_DEVICES=3 python run_classifier.py --task_name weibo --do_eval --do_train --bert_model chinese_L-12_H-768_A-12 --max_seq_length 128 --train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir bert_result
11/18/2018 21:56:59 - INFO - main - device cuda n_gpu 1 distributed training False
11/18/2018 21:56:59 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file chinese_L-12_H-768_A-12
Traceback (most recent call last):
File "run_classifier.py", line 661, in
main()
File "run_classifier.py", line 508, in main
tokenizer = BertTokenizer.from_pretrained(args.bert_model)
File "/home/lin/jpmorgan/pytorch-pretrained-BERT/pytorch_pretrained_bert/tokenization.py", line 141, in from_pretrained
tokenizer = cls(resolved_vocab_file, do_lower_case)
File "/home/lin/jpmorgan/pytorch-pretrained-BERT/pytorch_pretrained_bert/tokenization.py", line 94, in init
"model use
tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
".format(vocab_file))ValueError: Can't find a vocabulary file at path 'chinese_L-12_H-768_A-12'. To load the vocabulary from a Google pretrained model use
tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
The text was updated successfully, but these errors were encountered: