Generate minimal pairs (and minimal sets) for US English words.
In phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element
>>> import minpair
>>> minpair.vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]
pip install -U minpair
>>> import minpair
Words that differ in only one vowel phonological element. For example: bad, bed
>>> minpair.vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]
This package depends on a few NLTK's corpora, namely: brown, cmudict, universal_tagset, and words corpus. By default, this package will download these corpora into NLTK data directory if not available.
To disable the auto download of corpus data:
>>> minpair.generator(download_corpus=False).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'al', 'EH': 'l'}, {'AE': 'axe', 'EH': 'x'}, {'AE': 'bad', 'EH': 'bed'}, {'AE': 'bag', 'EH': 'beg'}]
This package depends on part-of-speech tagger to filter words from meaningful lexical categories. List of possible POS tags are found here. By default, this package will only return words that are tagged as 'ADJ', 'NOUN' or 'VERB'.
To use different POS tags:
>>> minpair.generator(pos=['VERB']).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'bag', 'EH': 'beg'}, {'AE': 'bat', 'EH': 'bet'}, {'AE': 'blast', 'EH': 'blest'}, {'AE': 'kept', 'EH': 'kept'}]
Alternatively, using method chaining:
>>> minpair.generator().pos(['VERB']).vowel_minpair(['AE', 'EH'])[:4]
[{'AE': 'bag', 'EH': 'beg'}, {'AE': 'bat', 'EH': 'bet'}, {'AE': 'blast', 'EH': 'blest'}, {'AE': 'kept', 'EH': 'kept'}]