Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
-
Updated
Jul 15, 2019 - Python
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
Code for NAACL 2019 paper: "Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions"
Data for the DiMSUM shared task at SEMEVAL 2016
A Python package for Exploratory Data Analysis (EDA) for text-based data.
Comparison between various noun compound embeddings
Data and code for the paper "ID10M: Idiom Identification in 10 Languages" (NAACL 2022).
Repo for the paper "MWE as WSD: Solving Multi-Word Expression Identification with Word Sense Disambiguation"
A SpaCy MWE identification pipeline component
Python implementation of Substitution-driven Measures of Association
Add a description, image, and links to the multiword-expressions topic page so that developers can more easily learn about it.
To associate your repository with the multiword-expressions topic, visit your repo's landing page and select "manage topics."