Skip to content

AttributeError: 'TextMultiField' object has no attribute 'vocab' #1249

Closed
@cocoxu

Description

I preprocessed the data by this command:
python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

then tried to load the Glove embeddings by this command:
python ./tools/embeddings_to_torch.py -emb_file_enc "glove_dir/glove.6B.100d.txt" -emb_file_dec "glove_dir/glove.6B.100d.txt" -dict_file "data/demo.vocab.pt" -output_file "data/demo_gloveembeddings"

but, got the following error:

Traceback (most recent call last):
  File "./tools/embeddings_to_torch.py", line 125, in <module>
    main()
  File "./tools/embeddings_to_torch.py", line 83, in main
    enc_vocab, dec_vocab = get_vocabs(opt.dict_file)
  File "./tools/embeddings_to_torch.py", line 20, in get_vocabs
    enc_vocab = fields['src'][0][1].vocab
AttributeError: 'TextMultiField' object has no attribute 'vocab'

Did I miss something? or is it due to the compatibility of vocab files between the current version of preprocessing.py and embeddings_to_torch.py?

Looked a bit more into this ... it looks like at some point the onmt.inputters.text_dataset.TextMultiField class has changed to remove the "vocab" attribute, but only have "fields" attribute now.

import torch
fields = torch.load("data/demo.vocab.pt")
print (fields['src'][0][1])
<onmt.inputters.text_dataset.TextMultiField object at 0x7fb527440860>
print (fields['src'][0][1].fields[0][1].vocab)
<torchtext.vocab.Vocab object at 0x7fb4c778f7f0>

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions