Skip to content

Too many false positives on multi-lexeme lemmas #5

Open
@collinfb

Description

@collinfb

What is the status of multi-lexeme lemmas, like "make (someone's) acquaintance in the Make_acquaintance frame? It seems to be matching just on make in the synset matching, which is causing lots of false matches. The problem is that many MWEs, like take a break, get away, and make a fool of (oneself) have a very common verb as one element. It would be OK if each of these matched on break, away and fool respectively, but matches on take, get and make are usually meaningless.
If I remember correctly, we simplified the matching by matching only on the first lemma. It's actually fairly hard to figure out the real semantic head of a MWE, but one possibility would be to look for the lowest frequency lemma in the MWE, on the assumption that it carries more information. That would require a lookup in a table of wordforms by frequency; it should be easy to get these for most major languages. Of course, this is adding complexity to the alignment, but would only require looking up a few thousand wordforms once for each language. It wouldn't affect the speed of the visualizer and I think it would produce more accurate alignments.
However, I suspect this is going to be low priority for the time being.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions