Bucket Argument in fasttext not working as expected ? #1765
Description
Hi, For the fasttext native from gensim:
My understanding is that according to the hashing trick, if bucket is < total # of subwords, there will be collisions and some subwords will be mapped to the same integers. Am I wrong?
However, it is not what I see on a toy example:
import gensim
from gensim.models.fasttext import FastText
sent = [['lol', 'dds', 'sdsf'], ['anticonsti']]
model = FastText(min_count = 1, bucket = 20)
model.build_vocab(sentences=sent)
model.train(sentences = sent, epochs = 1, report_delay = 1.0)
model.wv.ngrams
Expected Results
Dictionary with ngrams and their mappings to integers between 0 and 19 ( buckets = 20)
Actual Results
Dictionary with ngrams and their mappings to integers between 0 and 55 ( number of ngrams is 56 here)
Versions
import platform; print(platform.platform())
Windows-10-10.0.14393-SP0
import sys; print("Python", sys.version)
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
import numpy; print("NumPy", numpy.version)
NumPy 1.13.3
import scipy; print("SciPy", scipy.version)
SciPy 1.0.0
import gensim; print("gensim", gensim.version)
gensim 3.1.0
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 0