Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 1016 Bytes

README.md

File metadata and controls

23 lines (12 loc) · 1016 Bytes

Substitution-driven Measures of Association (SDMAs) for extracting collocations

SDMAs can be used as an alternative to measures such as PMI and Chi-squared in order to identify collocations in a corpus of text. However, unlike PMI and other purely statistical measures that are blind about the meaning of words, SDMAs measure the statistical association by taking into account the degree of semantic non-substitutability of sequences of words. Non-Substitutability is a Linguistic test that measures the fixedness of a phrase. SDMAs can be used to identify collocations and it has been shown that it can considerably outperform association measures such as Pointwise Mutual Information. You can read more about the theory behind this measure in this Jupyter notebook.

Applications

Similar to PMI, SDMAs can be used to identify collocations or multiword expressions.

Usage