apertium2unimorph

Scripts for extracting verbal and nominal inflectional paradigms from Apertium transducers for Turkic languages and converting them to the UniMorph schema. This code was used to generate the UniMorph data for Sakha and Tuvan, which was included in the SIGMORPHON 2021 Shared Task 0. Note: the shared task data was generated using the transducer versions from March 2021.

The scripts currently work only for Tuvan and Sakha but should be relatively straightforward to extend to other Turkic languages represented in Apertium.

Please contact mryskina@cs.cmu.edu for any questions.

Requirements

The corresponding Apertium analyzers must be installed. You can find the installation instructions at the respective repositories:

Tuvan: apertium-tyv
Sakha: apertium-sah

Other requirements:

Python >= 3.6

Usage

To run the extraction and conversion pipeline end-to-end, use:

./run.sh {tyv|sah} path/to/apertium/

where /path/to/apertium/ is the path to the directory one level above the transducer directory (path/to/apertium/apertium-{tyv|sah}).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
get_paradigms.sh		get_paradigms.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

apertium2unimorph

Requirements

Usage

About

Releases

Packages

Languages

License

ryskina/apertium2unimorph

Folders and files

Latest commit

History

Repository files navigation

apertium2unimorph

Requirements

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages