Vocabulary size cannot be specified for word features

When using word features, it is useful to be able to limit the vocabulary size, especially when using lemmas as word features, as the number of different lemmas can be very large.

However, `preprocess.py` does not support providing a list of integers, unlike the [Lua version](http://opennmt.net/OpenNMT/data/word_features/).

~I understand that it should be possible to get the same effect by extracting the feature vocabulary externally (pruning the vocabulary by frequency) and then supplying the vocabulary to `preprocess.py` with the parameter `--features_vocabs_prefix`.~

I just realized that `--features_vocabs_prefix` is completely ignored. Therefore, I understand that there is no way to control the feature vocabulary size; is that correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vocabulary size cannot be specified for word features #1452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vocabulary size cannot be specified for word features #1452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions