BTTS: Semi-Supervised Text Simplification

This is the original implementation of the AAAI 2020 paper: Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders

Model Sturcture

Training

Step 1 : Data and Resource Preparing

The Non-parallel data is provided by Surya et al. (2018) . You can download it from here .

Extract complex and simple sentences into directory data/non_parallel. If you want to train in a Semi-Supervised mode, you can put several parallel data such as Wiki-Large or Newsela into data/parallel. We provide several examples in the data directory.

Download resource.zip from here and extract resource.zip has :

Substitution rules extract from SimplePPDB
A pretrained BPE embedding with Fasttext
A pretrained Language model for reward calculation

Step 2 : Train the back-translation model

Train the model using

bash run.sh

If you want to use reinforcement learning to finetue the model, make sure you set RL_FINTUNE=1 in run.sh

In our experiments, we use 1 gpu for training, and several gpus for back-translation. So you should have at least 2 gpus to conduct our experiments. You can use --otf_num_processes to adjust the gpu numbers for back-translation.

Step 3 : Translate

bash translate.sh

to generate simplified sentences for evaluation. We use the test set in Nesela and Wiki-Large in our experiment.

Step 4 : Evaluate

bash eval.sh

About Corpus-Level SARI

For corpus level SARI, the original script provided by Xu et al. (2016) is only for 8 references WikiLarge dataset. Several previous works misused the original scripts on the 1 reference dataset which may lead to a very low score. As a result, we provide a python version for corpus level SARI in metrics/STAR.py, which can get the same result compared with the original script on Wiki-Large dataset and correct result on 1 reference Newsela dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
data		data
dataset		dataset
figure		figure
metrics		metrics
model		model
resource		resource
README.md		README.md
clean.py		clean.py
eval.sh		eval.sh
evaluator.py		evaluator.py
extract.py		extract.py
get_courpus_sari.py		get_courpus_sari.py
logger.py		logger.py
main.py		main.py
multiprocessing_event_loop.py		multiprocessing_event_loop.py
run.sh		run.sh
split.py		split.py
test.py		test.py
trainer.py		trainer.py
translate.sh		translate.sh
translator.py		translator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BTTS: Semi-Supervised Text Simplification

Model Sturcture

Training

Step 1 : Data and Resource Preparing

Step 2 : Train the back-translation model

Step 3 : Translate

Step 4 : Evaluate

About Corpus-Level SARI

About

Releases

Packages

Languages

brainsharks-fyp17/semits

Folders and files

Latest commit

History

Repository files navigation

BTTS: Semi-Supervised Text Simplification

Model Sturcture

Training

Step 1 : Data and Resource Preparing

Step 2 : Train the back-translation model

Step 3 : Translate

Step 4 : Evaluate

About Corpus-Level SARI

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages