Skip to content

Python implementation of Substitution-driven Measures of Association

Notifications You must be signed in to change notification settings

meghdadFar/SDMA

Repository files navigation

Substitution-driven Measures of Association (SDMAs) for extracting collocations

SDMAs can be used as an alternative to measures such as PMI and Chi-squared in order to identify collocations in a corpus of text. However, unlike PMI and other purely statistical measures that are blind about the meaning of words, SDMAs measure the statistical association by taking into account the degree of semantic non-substitutability of sequences of words. Non-Substitutability is a Linguistic test that measures the fixedness of a phrase. SDMAs can be used to identify collocations and it has been shown that it can considerably outperform association measures such as Pointwise Mutual Information. You can read more about the theory behind this measure in this Jupyter notebook.

Applications

Similar to PMI, SDMAs can be used to identify collocations or multiword expressions.

Usage