A C++ library for working with the CMU Pronouncing Dictionary. Now with Python bindings.
The CMU Pronouncing Dictionary (CMUdict) is
an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.
There's various tooling built around it, e.g. this lovely Python library. But I was working in C++ and WebAssembly, and ended up writing my own interface for using CMUdict.
Includes CMUdict as a sub-module. To load that data use:
git submodule update --init --recursive
To compile this project with CMake:
mkdir build
cd build
cmake ..
make
Which will also build a build/tests/test_phonetic Catch-2 test file.
This library can also be compiled to WebAssembly using Emscripten:
mdkir build
cd build
emcmake cmake ..
emmake make
This will generate a phonetic.js, phonetic.wasm, and phonetic.data. Using these you'll be able to call any of the Phonetic class methods straight from Javascript.
Use this library to convert English words into:
- Possible pronunciations, as ARPABET which encodes IPA into two letter ASCII sequences.
- Possible patterns of syllabic stress, as strings of numbers
0,1, &2, where0is unstressed,1is primary stress, and2is secondary stress. - Possible syllable counts (counting the number of vowel phones).
Note that if a word has multiple pronunciations, stress patterns, or syllable counts, all of these will be returned.
The tests (test/test_phonetic.cpp) offers a chance to see all of the methods in action.
- Search by phones.
- Search by stress.
- Search by syllable count.