This repository contains the code used in the generation and evaluation of the concordance-style annotation files for Wao Terero, Croatian and English, used in the ComputEL 5 Paper "A Word-and-Paradigm Workflow for Fieldwork Annotation" by Copot M., Court S., Diewald N., Antetomaso S., Elsner M..
Included is an unsupervised model adapted from SIGMORPHON 2020's Task 2 baseline used to cluster lemmas and paradigm cells from unannotated input text, as well as scripts to construct datasets for annotation and a regular expression search tool to generate additional potential examples when annotating lemmas.
For annotation guidelines and instructions for using the regular expression tool, please see "annotation-guidelines.pdf".