Skip to content

TimKam/compound-word-splitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

compound-word-splitter

https://travis-ci.org/TimKam/compound-word-splitter.svg?branch=master

Splits words that are not recognized by pyenchant (spell checker) into largest possible compounds.

Installation

Make sure you have enchant installed before proceeding.

Now run

pip install compound-word-splitter

Note that the languages that are available by default depend on your operating system's configuration and could be, for example:

['en', 'en_CA', 'en_GB', 'en_US']

If you would like to use a different language, like de_de in the example below, you will have to install the myspell dictionary for it (myspell-de-de).

Usage

import splitter

splitter.split('artfactory')

returns

['art', 'factory']

.

split('Glossarelement', 'de_de')

returns

['Glossar', 'Element']

.

If the word cannot be split into compounds pyenchant recognizes as words, the splitter returns an empty string.