When using word features, it is useful to be able to limit the vocabulary size, especially when using lemmas as word features, as the number of different lemmas can be very large.
However, preprocess.py does not support providing a list of integers, unlike the Lua version.
I understand that it should be possible to get the same effect by extracting the feature vocabulary externally (pruning the vocabulary by frequency) and then supplying the vocabulary to preprocess.py with the parameter --features_vocabs_prefix.
I just realized that --features_vocabs_prefix is completely ignored. Therefore, I understand that there is no way to control the feature vocabulary size; is that correct?