Human Evaluation Benchmark for Text Simplification and System Outputs from the paper:
Simple and Effective Text Simplification using Semantic and Neural Models
Elior Sulem, Omri Abend and Ari Rappoport
Proc. of ACL 2018
1. Human Evaluation Benchmark and Corresponding System Outputs
./human_evaluation_benchmark_acl2018.ods
Human evaluation scores given by 3 annotators for the 4 elicitation questions described in the paper. Each annotator scored the same 1600 (input,output) pairs.
./Evaluation_system_outputs
Outputs of the systems for the 70 sentences manually evaluated. These are the first 70 sentences of the test corpus of Xu et al., 2016. Uniform tokenization and truecasing styles for all systems are obtained using the Moses toolkit (Koehn et al., 2007).
The source and reference sentences as well as the outputs for SBMT-SARI (Xu et al., 2016) can be found at https://github.com/cocoxu/simplification. The outputs for HYBRID (Narayan and Gardent, 2014) can be found at https://github.com/XingxingZhang/dress.
2. Additional System Outputs
./All_system_outputs
Outputs of the systems for the whole test corpus of Xu et al., 2016. Uniform tokenization and truecasing styles for all systems are obtained using the Moses toolkit (Koehn et al., 2007).
The source and reference sentences as well as the outputs for SBMT-SARI (Xu et al., 2016) can be found at https://github.com/cocoxu/simplification. The outputs for HYBRID (Narayan and Gardent, 2014) can be found at https://github.com/XingxingZhang/dress.
./WEB-SPLIT_experiment_outputs
Output of our DSS system on the test set of the WEB-SPLIT corpus, Version 0.1 (Narayan et al., 2017). Tokenization and truecasing styles are obtained using the Moses toolkit (Koehn et al., 2007).
License