Sigmoid-table behavior in FastText, etc code is fishy #2725
Description
Our implementation of FastText training error-backpropagation does some fishy things that deviate from the FB reference implementation.
For example, at..
...we simply short-circuit skip to the next loop when an exponent is out of the desired range. (The same approach appears in Word2Vec and Doc2Vec cython code, as well.)
However, the seemingly-analogous code in Facebook's FastText instead clips the values to 0.0/1.0 in these cases, allowing backprop to proceed. See:
Our deviation from Facebook's code's practice is suspicious on both correctness & consistency grounds. This simple continue
does however match the behavior we copied long-ago from word2vec.c
.
Other perhaps-more superficial changes are that FB's code makes its lookup-tables 512 slots long instead of 1000, but allows exponents to 8 instead of 6:
Again, our FT implementation seems to have copied our copy-of-word2vec.c choices, instead of the reference FB implementation choices. If anything, it could make more sense to update the word2vec-derived code with these newer choices – as they at least plausibly represent practices improved by experience.