Too many false positives on multi-lexeme lemmas

What is the status of multi-lexeme lemmas, like "_make_ (someone's) _acquaintance_ in the Make_acquaintance frame?  It seems to be matching just on _make_ in the synset matching, which is causing lots of false matches.  The problem is that many MWEs, like _take a break_, _get away_,  and  _make a fool of_ (oneself) have a very common verb as one element.  It would be OK if each of these matched on _break, away_ and _fool_ respectively, but matches on _take, get_  and _make_ are usually meaningless.  
If I remember correctly, we simplified the matching by matching only on the first lemma.  It's actually fairly hard to figure out the real semantic head of a MWE, but one possibility would be to look for the **lowest frequency** lemma in the MWE, on the assumption that it carries more information. That would require a lookup in a table of wordforms by frequency; it should be easy to get these for most major languages.  Of course, this is adding complexity to the alignment, but would only require looking up a few thousand wordforms once for each language.  It wouldn't affect the speed of the visualizer and I think it would produce more accurate alignments.
However, I suspect this is going to be low priority for the time being.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Too many false positives on multi-lexeme lemmas #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Too many false positives on multi-lexeme lemmas #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions