Splits words that are not recognized by pyenchant (spell checker) into largest possible compounds.
Make sure you have enchant installed before proceeding.
Now run
pip install compound-word-splitter
Note that the languages that are available by default depend on your operating system's configuration and could be, for example:
['en', 'en_CA', 'en_GB', 'en_US']
If you would like to use a different language, like de_de
in the example below, you will have to install the
myspell
dictionary for it (myspell-de-de).
import splitter
splitter.split('artfactory')
returns
['art', 'factory']
.
split('Glossarelement', 'de_de')
returns
['Glossar', 'Element']
.
If the word cannot be split into compounds pyenchant recognizes as words, the splitter returns an empty string.