Retrofit should not lowercase words #11

Karryanna · 2019-04-16T09:38:49Z

Hello. I am not sure whether there are any intentions to further maintain this project but should there be somes, or just in case anyone else would like to use this tool and check the issues in advance…
… I would like to report that retrofitting lowercases words in model by calling .lower() during model reading.

While I agree that in general, it may be desirable to lowercase words before training their vectors, I don't think any tool working with already trained vectors should do that, at least not by default. I tried to retrofit my model trained on lemmata, and some of those lemmata are non-lowercased, even more, sometimes the lowercased and non-lowercased version has a different meaning, so retrofitting hurt the model quality just by that.

(Of coure, the fix is easy and I will simply fix my copy. And yep, apart from that, retrofitting rather helps so thanks for it anyway!)

Lduignan1 · 2023-06-04T14:30:28Z

interestingly when we implemented the algorithm we also seem to be getting better results on word similarity and sentiment analysis tasks when we don't lowercase words...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrofit should not lowercase words #11

Retrofit should not lowercase words #11

Karryanna commented Apr 16, 2019

Lduignan1 commented Jun 4, 2023

Retrofit should not lowercase words #11

Retrofit should not lowercase words #11

Comments

Karryanna commented Apr 16, 2019

Lduignan1 commented Jun 4, 2023