Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many false positives on multi-lexeme lemmas #5

Open
collinfb opened this issue Jul 19, 2020 · 0 comments
Open

Too many false positives on multi-lexeme lemmas #5

collinfb opened this issue Jul 19, 2020 · 0 comments

Comments

@collinfb
Copy link
Member

What is the status of multi-lexeme lemmas, like "make (someone's) acquaintance in the Make_acquaintance frame? It seems to be matching just on make in the synset matching, which is causing lots of false matches. The problem is that many MWEs, like take a break, get away, and make a fool of (oneself) have a very common verb as one element. It would be OK if each of these matched on break, away and fool respectively, but matches on take, get and make are usually meaningless.
If I remember correctly, we simplified the matching by matching only on the first lemma. It's actually fairly hard to figure out the real semantic head of a MWE, but one possibility would be to look for the lowest frequency lemma in the MWE, on the assumption that it carries more information. That would require a lookup in a table of wordforms by frequency; it should be easy to get these for most major languages. Of course, this is adding complexity to the alignment, but would only require looking up a few thousand wordforms once for each language. It wouldn't affect the speed of the visualizer and I think it would produce more accurate alignments.
However, I suspect this is going to be low priority for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant