Description
I preprocessed the data by this command:
python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
then tried to load the Glove embeddings by this command:
python ./tools/embeddings_to_torch.py -emb_file_enc "glove_dir/glove.6B.100d.txt" -emb_file_dec "glove_dir/glove.6B.100d.txt" -dict_file "data/demo.vocab.pt" -output_file "data/demo_gloveembeddings"
but, got the following error:
Traceback (most recent call last):
File "./tools/embeddings_to_torch.py", line 125, in <module>
main()
File "./tools/embeddings_to_torch.py", line 83, in main
enc_vocab, dec_vocab = get_vocabs(opt.dict_file)
File "./tools/embeddings_to_torch.py", line 20, in get_vocabs
enc_vocab = fields['src'][0][1].vocab
AttributeError: 'TextMultiField' object has no attribute 'vocab'
Did I miss something? or is it due to the compatibility of vocab files between the current version of preprocessing.py and embeddings_to_torch.py?
Looked a bit more into this ... it looks like at some point the onmt.inputters.text_dataset.TextMultiField class has changed to remove the "vocab" attribute, but only have "fields" attribute now.
import torch
fields = torch.load("data/demo.vocab.pt")
print (fields['src'][0][1])
<onmt.inputters.text_dataset.TextMultiField object at 0x7fb527440860>
print (fields['src'][0][1].fields[0][1].vocab)
<torchtext.vocab.Vocab object at 0x7fb4c778f7f0>