Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.
- 🇬🇧 English (
eng) - 🇷🇺 Russian (
rus) - 🇷🇸 Serbian Cyrillic (
srp-cyrl) - 🇷🇸 Serbian Latin (
srp-latn) - 🇹🇷 Turkish (
tur) - 🇬🇪 Georgian (
kat) - 🇩🇪 German (
deu) - 🇫🇷 French (
fra) - 🇷🇴 Romanian (
ron) - 🇪🇸 Spanish (
spa) - 🇵🇹 Portuguese (
por) - 🇵🇱 Polish (
pol) - 🏛️ Latin (
lat)
When no language is specified, the library automatically detects the most likely language:
>>> from syllabreak import Syllabreak
>>> s = Syllabreak("-")
>>> s.syllabify("hello")
'hel-lo'
>>> s.syllabify("здраво") # Serbian Cyrillic
'здра-во'
>>> s.syllabify("привет") # Russian
'при-вет'You can specify the language code for more predictable results:
>>> s = Syllabreak("-")
>>> s.syllabify("problem", lang="eng") # Force English rules
'pro-blem'
>>> s.syllabify("problem", lang="srp-latn") # Force Serbian Latin rules
'prob-lem'This is useful when:
- The text could match multiple languages
- You want consistent rules for a specific language
- Processing text in a known language
The library returns all matching languages sorted by confidence:
>>> from syllabreak import Syllabreak
>>> s = Syllabreak()
>>> s.detect_language("hello")
['eng', 'srp-latn', 'tur'] # Matches English, Serbian Latin and Turkish
>>> s.detect_language("čovek")
['srp-latn', 'eng', 'tur'] # Serbian Latin has highest confidence due to č