My thesis on "Open Source Code and Low Resource Languages" for an MSc Language Science and Technology at Saarland University
Of the roughly seven thousand languages currently spoken, less than fifty have a significant digital presence. In order for a language to be used digitally and to survive in the long term, its speakers may need to develop computational resources: orthographies, dictionaries, grammars, spell checkers, parsers, and more. Instead of depending on large providers, researchers and communities can leverage the open source code methodology as a means of bootstrapping digital language development. In this thesis, I discuss the state of the field for low resource languages, what open source licensing means and how it can help language communities. I provide two cases studies, looking in detail at Gaelic and Naskapi, and I describe a decentralised, crowd-sourced database I have developed to catalogue open source code which can be used for low resource languages. Looking to the future, I suggest steps for developing and using code going forwards.
My specific contributions in this thesis include not only the first published analysis of the state of the field for open source code specifically regarding low resource languages, and an exposition of the only database of solely open source resources, but also independent fieldwork on Naskapi that pertains to its current digital presence on the web. I also outline how researchers and developers can change their processes to help make their work more effectual in the long term.
Copied verbatim from abstract.tex
I use TeXShop to build, and I use apalike-refs as the bibliographic style. Make sure to put apalike-refs.bst
where your system can read it. In my case, ~/Library/texmf/bibtex/bst/.
worked.
Or just read thesis.pdf
. It's easier.
Please feel free. Note that this is largely an academic work, and I'd probably prefer to talk to you first before merging anything. Send me an email, or open an issue!
CC-BY-SA 4.0 License © Richard Littauer 2018