Skip to content

Scripts for extracting inflectional paradigms from Apertium transducers for Turkic languages and converting them to UniMorph schema

License

Notifications You must be signed in to change notification settings

ryskina/apertium2unimorph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

apertium2unimorph

Scripts for extracting verbal and nominal inflectional paradigms from Apertium transducers for Turkic languages and converting them to the UniMorph schema. This code was used to generate the UniMorph data for Sakha and Tuvan, which was included in the SIGMORPHON 2021 Shared Task 0. Note: the shared task data was generated using the transducer versions from March 2021.

The scripts currently work only for Tuvan and Sakha but should be relatively straightforward to extend to other Turkic languages represented in Apertium.

Please contact mryskina@cs.cmu.edu for any questions.

Requirements

The corresponding Apertium analyzers must be installed. You can find the installation instructions at the respective repositories:

Other requirements:

  • Python >= 3.6

Usage

To run the extraction and conversion pipeline end-to-end, use:

./run.sh {tyv|sah} path/to/apertium/

where /path/to/apertium/ is the path to the directory one level above the transducer directory (path/to/apertium/apertium-{tyv|sah}).

About

Scripts for extracting inflectional paradigms from Apertium transducers for Turkic languages and converting them to UniMorph schema

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published