HSplit-corpus

Gold-Standard Sentence Splitting Corpus

If you use the corpus, please cite the following paper:

  BLEU is Not Suitable for the Evaluation of Text Simplification
  Elior Sulem, Omri Abend and Ari Rappoport
  Proc. of EMNLP 2018

./HSplit

Gold-standard Sentence Splitting Corpus composed by the generations made by 4 annotators, given the complex side of the test corpus of Xu et al., 2016, following the sentence splitting guidelines. HSplit 1 and 2 correspond to Set 1 guidelines. HSplit 3 and 4 correspond to Set 2 guidelines. The corpus includes 359 sentences.

Uniform tokenization and truecasing styles are obtained using the Moses toolkit (Koehn et al., 2007).

./HSplit_human_evaluation

Human evaluation scores given for the 4 elicitation questions described in the paper. Each HSplit corpus is scored by 3 annotators. The human evaluation concerns the first 70 sentences of HSplit. The scores appear in the ods files. The corresponding sentences appear in the txt files.

The evaluation scores for the simplification systems mentioned in the paper can be found at https://github.com/eliorsulem/simplification-acl2018.

License

Attribution-ShareAlike 3.0 Unported license

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
HSplit		HSplit
HSplit_human_evaluation		HSplit_human_evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HSplit-corpus

About

Uh oh!

Releases

Packages

eliorsulem/HSplit-corpus

Folders and files

Latest commit

History

Repository files navigation

HSplit-corpus

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages