Skip to content

Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.

License

Notifications You must be signed in to change notification settings

apakabarfm/syllabreak-python

Repository files navigation

Tests

syllabreak

Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.

Supported Languages

  • 🇬🇧 English (eng)
  • 🇷🇺 Russian (rus)
  • 🇷🇸 Serbian Cyrillic (srp-cyrl)
  • 🇷🇸 Serbian Latin (srp-latn)
  • 🇹🇷 Turkish (tur)
  • 🇬🇪 Georgian (kat)
  • 🇩🇪 German (deu)
  • 🇫🇷 French (fra)
  • 🇷🇴 Romanian (ron)
  • 🇪🇸 Spanish (spa)
  • 🇵🇹 Portuguese (por)
  • 🇵🇱 Polish (pol)
  • 🏛️ Latin (lat)

Usage

Auto-detect language

When no language is specified, the library automatically detects the most likely language:

>>> from syllabreak import Syllabreak
>>> s = Syllabreak("-")
>>> s.syllabify("hello")
'hel-lo'
>>> s.syllabify("здраво")  # Serbian Cyrillic
'здра-во'
>>> s.syllabify("привет")  # Russian
'при-вет'

Specify language explicitly

You can specify the language code for more predictable results:

>>> s = Syllabreak("-")
>>> s.syllabify("problem", lang="eng")  # Force English rules
'pro-blem'
>>> s.syllabify("problem", lang="srp-latn")  # Force Serbian Latin rules
'prob-lem'

This is useful when:

  • The text could match multiple languages
  • You want consistent rules for a specific language
  • Processing text in a known language

Language Detection

The library returns all matching languages sorted by confidence:

>>> from syllabreak import Syllabreak
>>> s = Syllabreak()
>>> s.detect_language("hello")
['eng', 'srp-latn', 'tur']  # Matches English, Serbian Latin and Turkish
>>> s.detect_language("čovek")
['srp-latn', 'eng', 'tur']  # Serbian Latin has highest confidence due to č

Lines of Code

Lines of Code graph

About

Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •