Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrofit should not lowercase words #11

Open
Karryanna opened this issue Apr 16, 2019 · 1 comment
Open

Retrofit should not lowercase words #11

Karryanna opened this issue Apr 16, 2019 · 1 comment

Comments

@Karryanna
Copy link

Hello. I am not sure whether there are any intentions to further maintain this project but should there be somes, or just in case anyone else would like to use this tool and check the issues in advance…
… I would like to report that retrofitting lowercases words in model by calling .lower() during model reading.

While I agree that in general, it may be desirable to lowercase words before training their vectors, I don't think any tool working with already trained vectors should do that, at least not by default. I tried to retrofit my model trained on lemmata, and some of those lemmata are non-lowercased, even more, sometimes the lowercased and non-lowercased version has a different meaning, so retrofitting hurt the model quality just by that.

(Of coure, the fix is easy and I will simply fix my copy. And yep, apart from that, retrofitting rather helps so thanks for it anyway!)

@Lduignan1
Copy link

interestingly when we implemented the algorithm we also seem to be getting better results on word similarity and sentiment analysis tasks when we don't lowercase words...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants