This is the artifact for the paper "Word Closure-Based Metamorphic Testing for Machine Translation". This artifact supplies the replication package and the supplementary material of the paper.
Word Closure is an output comparison unit for Metamorphic Testing (MT) of Machine Translation Systems (MTSs). In MT for MTS, the input comprises a pair of source and follow-up input sentences
- In
/supplements/Illustrations.md
, we further illustrate in detail the five motivation examples presented in the paper.
In order to replicate the experiments, please perform the following steps to prepare necessary dependencies:
-
Prepare a python environment:
- Execute the following commands:
conda create -n wordclosure python=3.8 conda activate wordclosure pip install -r requirements.txt
- Specifically, the following libraries in
requirements.txt
will be installed:apex==0.9.10dev boto3==1.34.15 botocore==1.34.15 filelock==3.13.1 gensim==4.3.2 jieba==0.42.1 nltk==3.8.1 numpy==1.24.0 pycorenlp==0.3.0 requests==2.25.1 scikit_learn==1.3.2 tokenizers==0.15.0 torch==2.1.2 tqdm==4.61.2 transformers==4.36.2
- Execute the following commands:
-
Set up the Stanford Corenlp server:
- Download
stanford-corenlp-4.5.1.zip
from https://stanfordnlp.github.io/CoreNLP/history.html and unzip it into/stanford-corenlp-4.5.1
folder. - Download
stanford-corenlp-4.5.1-models-chinese.jar
from https://stanfordnlp.github.io/CoreNLP/history.html and move it into/stanford-corenlp-4.5.1
folder. - Execute the following commands to start the Corenlp parsing server for English language and Chinese language respectively.
cd stanford-corenlp-4.5.1 java -Xmx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000 java -Xmx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9001 -timeout 15000
- Download
-
Prepare word alignment tool AWESOME:
- Download
model_without_co.zip
from https://drive.google.com/file/d/1IcQx6t5qtv4bdcGjjVCwXnRkpr67eisJ/view?usp=sharing and unzip it into the/scripts/model_without_co
folder - Download all the files from https://github.com/neulab/awesome-align/tree/master/awesome_align and store it into the
/scripts/awesome_align
folder.
- Download
-
Prepare word2vector models
- Download
GoogleNews-vectors-negative300.bin.gz
from https://code.google.com/archive/p/word2vec/ and unzip it into the/scripts/word2vec
folder. - Download
merge_sgns_bigram_char300.txt.bz2
from https://github.com/Embedding/Chinese-Word-Vectors and store it into the/scripts/word2vec
folder.
- Download
All the datasets needed for the experiment are stored in the /data
folder. We have provided detailed information to illustrate the structure and contents of our released dataset in /data/Data.md
.
Structure of /data
folder:
data
┝━━ Limitation-statistics (False Positives and False Negatives for five existing MT for MTS works)
│ ┝━━ CAT-google-FN.csv
│ ┝━━ CAT-google-FP.csv
│ ┕━━ ...
┝━━ Motivation-examples (Five motivation examples in the paper)
│ ┝━━ CAT-en2zh-motivation.csv
│ ┝━━ CIT-en2zh-motivation.csv
│ ┕━━ ...
┝━━ RQ1 (Metamorphic test case pairs for three MTSs generated by five Metamorphic Relations)
│ ┝━━ CAT-en2zh-google.csv
│ ┝━━ CAT-en2zh-bing.csv
│ ┝━━ CAT-en2zh-youdao.csv
│ ┝━━ CIT-en2zh-google.csv
│ ┕━━ ...
┝━━ RQ2&5 (Metamorphic test case pairs for five Metamorphic Relations)
│ ┝━━ CAT-en2zh-merge.csv
│ ┝━━ CAT-zh2en-merge.csv
│ ┝━━ CIT-en2zh-merge.csv
│ ┕━━ ...
┕━━ RQ3 (Fine-grained violation locating results)
┝━━ CAT-en2zh-google-LABEL.txt
┝━━ CAT-zh2en-google-LABEL.txt
┝━━ CIT-en2zh-google-LABEL.txt
┕━━ ...
We provide scripts for reproducing our experiment results in the /scripts
folder.
Structure of /scripts
folder:
scripts
┝━━ awesome_align/
┝━━ model_without_co/
┝━━ utils/
┝━━ word2vec/
┝━━ Limitation-statistics.py
┝━━ Motivation.py
┝━━ RQ1-en2zh.py
┝━━ RQ1-zh2en.py
┝━━ RQ2-en2zh.py
┝━━ RQ2-zh2en.py
┝━━ RQ3.py
┝━━ RQ4-en2zh.py
┝━━ RQ4-zh2en.py
┝━━ RQ5-en2zh.py
┝━━ RQ5-zh2en.py
┝━━ en.py
┕━━ zh.py
Please follow the steps below to reproduce all the experimental results in our paper.
- To count the number of FPs/FNs due to the limitations in existing methods, please run:
The statistic results will be printed and recorded into the
cd scripts python Limitation-statistics.py
result.txt
file in the/scripts/Limitation
folder.
- To run our word closure-based comparison method on the five motivation examples in the paper, please execute:
The detailed execution results of our word closure-based comparison method will be printed.
python Motivation.py
- To replicate the experiment results of RQ1, please run:
The evaluation results will be printed and recorded into the
python RQ1-en2zh.py python RQ1-zh2en.py
result_en2zh.txt
andresult_zh2en.txt
files in the/scripts/RQ1
folder.
- To replicate the experiment results of RQ2, please run:
The comaprison results will be printed and recorded into the
python RQ2-en2zh.py python RQ2-zh2en.py
result_en2zh.txt
andresult_zh2en.txt
files in the/scripts/RQ2
folder.
- To replicate the experiment results of RQ3, please run:
The evaluation results will be printed and recorded into the
python RQ3.py
result_en2zh.txt
andresult_zh2en.txt
files in the/scripts/RQ3
folder.
- To replicate the experiment results of RQ4, please run:
The evaluation results will be printed.
python RQ4-en2zh.py python RQ4-zh2en.py
- To replicate the experiment results of RQ5, please run:
The evaluation results of our method with its highest F1 score under different configurations will be printed and recorded into the
python RQ5-en2zh.py python RQ5-zh2en.py
result_en2zh.txt
andresult_zh2en.txt
files in the/scripts/RQ5
folder.
If you have questions, suggestions and bug reports, please email imjinshuo@whu.edu.cn.