Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction

This repository contains the Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction described in :

Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata and Kentaro Inui. 2019. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, USA.

If you make use of this code, please cite the above papers.

Pre-requisites

We only support Python 2. It is safest to install everything in a clean virtualenv.

It can be installed as follows:

pip install -r requirements.txt

(NOTE: To get the exact data you may need to use NLTK v2.0b7 for tokenization. )

Data Preparation for Cross-Corpora Evaluation

To convert raw data (.xml) to m2 format use the following preprocessing script.

## CLC-FCE
python2 clcfce_to_m2.py -in dataset/ -out output

## KJ/ICNLAE
python2 kj_to_m2.py -in kj_all.raw -out output_file

Preprocessing Scripts for the Other Corpora

For the other corpora we used such as CoNLL-2014, 2013 and JFELG, you can use the following official preprocessing scripts.

Evaluation

You can evaluate your systems using the following scorers (m2scorer and GLEU).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction

Pre-requisites

Data Preparation for Cross-Corpora Evaluation

Preprocessing Scripts for the Other Corpora

Evaluation

About

Releases

Packages

Languages

tomo-wb/GEC_CCE

Folders and files

Latest commit

History

Repository files navigation

Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction

Pre-requisites

Data Preparation for Cross-Corpora Evaluation

Preprocessing Scripts for the Other Corpora

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages