TranslitKit is a framework for Hebrew-English transliteration.
gem install translit_kit# in your Gemfile
gem 'translit_kit'Requires Ruby 2.2 or later
Basic transliteration
require 'translit_kit'
word = HebrewWord.new "אַברָהָם"
word.transliterate(:single)
# => ["avrohom"]
# Shortcut
word.t(:single)
# => ["avrohom"]Transliteration is powered by phoneme maps, files that map between Hebrew phonemes, or units of sound, and English characters. (see below)
Three phoneme_maps are provided: :long, :short, and :single.
You can easily add your own (see below)
word.t(:single)
# => ["avrohom"]
word.t(:short)
# => ["avroom", "avroam", "avroem", "avrohom", "avroham",
# "avrohem", "avraom", "avraam", "avraem", "avrahom",
# "avraham", "avrahem", "avreom", "avream", "avreem",
# "avrehom", "avreham", "avrehem" ]
word.t(:long)
# => ["avroom", "avrooom", "avroohm", ... ] # 5,997 more!The default is :short:
word.t == word.t(:short)
# => trueTo get the total permutation count, call HebrewWord#inspect
word.inspect
# => "אַברָהָם: Permutations: 1 single | 18 short | 6000 long"Phoneme Maps are simply JSON files, placed in the lib/phoneme_maps directory.
The file should map between each String (the phonemes) and an Arrays of replacement characters.
{
"ב": ["v"],
"בּ": ["b", "bb"]
}A phoneme can be a Hebrew character א, nekuda (ָ), or character with modifiers, such as a dagesh (בּ). Keep in mind that many characters will be normalized (see below).
To install your custom map, place the file in lib/resources
Your file will be available as the symbol:<filename> without the .json extension.
Example: klingon.json becomes :klingon
Now you can use it anywhere:
word.transliterate(:klingon)
# => (Results)At present, your map will not display results in HebrewWord#inspect
TranslitKit is currently maintained by @AnalyzePlatypus.
Contributions welcome!
When a word is transliterated, it is pre-processed to normalize certain characters. Specifically:
- Whitespace is stripped
- The final letters
[םןךףץ]are normalized to their standard forms - CHATAF nekudos
['ֲ','ֳ','ֱ']are normalized to their standard forms - Full CHIRIK, TZEIREI, and CHOLOM nekudos have their letters removed
- DAGESH characters are removed from all but the characters
[בוכפת]