Pre-trained word vectors

We are publishing pre-trained word vectors for Russian language. These vectors were trained on joint Russian Wikipedia and Lenta.ru corpora.

All vectors are 300-dimentional. We used fastText skpip-gram (see Bojanowski et al. (2016)) for vectors training as well as various preprocessing options (see below).

You can get vectors either in binary or in text (vec) formats both for fastText and GloVe.

License

The pre-trained word vectors are distributed under the License Apache 2.0.

Downloads

The models can be downloaded from:

Model	Preprocessing	Vectors
fastText (skipgram)	tokenize (nltk word_tokenize), lemmatize (pymorphy2)	bin, vec
fastText (skipgram)	tokenize (nltk word_tokenize), lowercasing	bin, vec
fastText (skipgram)	tokenize (nltk wordpunсt_tokenize)	bin, vec
fastText (skipgram)	tokenize (nltk word_tokenize)	bin, vec
fastText (skipgram)	tokenize (nltk word_tokenize), remove stopwords	bin, vec

Word vectors training parameters

These word vectors were trained with following parameters ([...] is for default value):

fastText (skipgram)

lr [0.1]
lrUpdateRate [100]
dim 300
ws [5]
epoch [5]
neg [5]
loss [softmax]
pretrainedVectors []
saveOutput [0]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrained-vectors.md

pretrained-vectors.md

Pre-trained word vectors

License

Downloads

Word vectors training parameters

fastText (skipgram)

Files

pretrained-vectors.md

Latest commit

History

pretrained-vectors.md

File metadata and controls

Pre-trained word vectors

License

Downloads

Word vectors training parameters

fastText (skipgram)