This repository stores source codes used in a research project entitled: "Phonological Principles and Automatic Phonemic and Phonetic Transcription of Khmer Words" which was presented in partial fulfillment of the requirements for the degree of Master OF Arts in Linguistics at the International College of Payap University, Thailand, in 2016.
There are two source codes:
- Ruby codes for data prepration processes
- Thrax codes for the conversion processes
- data --containing 18,948 entries from Khmer-Khmer Dictionary (1967)
- cleanup.rb --removing stray characters, prefixes and duplicate entries
- filter1.rb --removing Pali/Sanskrit loanwords using etymological tags
- filter2.rb --removing P/S loanwords using diacritics and independent vowels
- filter3.rb --removing P/S loanwords using pronunciation field
- syl_group.rb --grouping native khmer words into their respective syllable groups
- automator_phonemic.grm --taking orthographic words one at a time and convert it into phonemic transcription.
- automator_phonetic.grm --taking phonemic transcriptions one at a time and convert it into phonetic transcription.
All inquiries should be redirected to makara_sok@hotmail.com.