Skip to content

Artifact for the paper "Word Closure-Based Metamorphic Testing for Machine Translation".

License

Notifications You must be signed in to change notification settings

imjinshuo/Word-Closure-Based-MT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artifact for the Paper "Word Closure-Based Metamorphic Testing for Machine Translation"

This is the artifact for the paper "Word Closure-Based Metamorphic Testing for Machine Translation". This artifact supplies the replication package and the supplementary material of the paper.

This artifact contains:

Introduction

Word Closure is an output comparison unit for Metamorphic Testing (MT) of Machine Translation Systems (MTSs). In MT for MTS, the input comprises a pair of source and follow-up input sentences $S_s$ and $S_f$ generated through the input transformation of MRs, as well as their translations $T_s$ and $T_f$ (i.e., source and follow-up outputs) returned by MTS. We propose word closure to detect whether MTS violates the MR on the test case pair. The usage of word closure involves two modules for constructing word closures from the test case pair and one module for detecting the violation by evaluating the semantics of the translation counterparts linked by word closures. Please refer to the paper for more details.

Overview

Supplementary Materials

Setup

In order to replicate the experiments, please perform the following steps to prepare necessary dependencies:

  • Prepare a python environment:

    • Execute the following commands:
      conda create -n wordclosure python=3.8
      conda activate wordclosure
      pip install -r requirements.txt
    • Specifically, the following libraries in requirements.txt will be installed:
      apex==0.9.10dev
      boto3==1.34.15
      botocore==1.34.15
      filelock==3.13.1
      gensim==4.3.2
      jieba==0.42.1
      nltk==3.8.1
      numpy==1.24.0
      pycorenlp==0.3.0
      requests==2.25.1
      scikit_learn==1.3.2
      tokenizers==0.15.0
      torch==2.1.2
      tqdm==4.61.2
      transformers==4.36.2
  • Set up the Stanford Corenlp server:

    • Download stanford-corenlp-4.5.1.zip from https://stanfordnlp.github.io/CoreNLP/history.html and unzip it into /stanford-corenlp-4.5.1 folder.
    • Download stanford-corenlp-4.5.1-models-chinese.jar from https://stanfordnlp.github.io/CoreNLP/history.html and move it into /stanford-corenlp-4.5.1 folder.
    • Execute the following commands to start the Corenlp parsing server for English language and Chinese language respectively.
      cd stanford-corenlp-4.5.1
      java -Xmx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000
      java -Xmx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9001 -timeout 15000
  • Prepare word alignment tool AWESOME:

  • Prepare word2vector models

Dataset

All the datasets needed for the experiment are stored in the /data folder. We have provided detailed information to illustrate the structure and contents of our released dataset in /data/Data.md .

Structure of /data folder:

data
┝━━ Limitation-statistics (False Positives and False Negatives for five existing MT for MTS works)
│   ┝━━ CAT-google-FN.csv
│   ┝━━ CAT-google-FP.csv
│   ┕━━ ...
┝━━ Motivation-examples (Five motivation examples in the paper)
│   ┝━━ CAT-en2zh-motivation.csv
│   ┝━━ CIT-en2zh-motivation.csv
│   ┕━━ ...
┝━━ RQ1 (Metamorphic test case pairs for three MTSs generated by five Metamorphic Relations)
│   ┝━━ CAT-en2zh-google.csv
│   ┝━━ CAT-en2zh-bing.csv
│   ┝━━ CAT-en2zh-youdao.csv
│   ┝━━ CIT-en2zh-google.csv
│   ┕━━ ...
┝━━ RQ2&5 (Metamorphic test case pairs for five Metamorphic Relations)
│   ┝━━ CAT-en2zh-merge.csv
│   ┝━━ CAT-zh2en-merge.csv
│   ┝━━ CIT-en2zh-merge.csv
│   ┕━━ ...
┕━━ RQ3 (Fine-grained violation locating results)
    ┝━━ CAT-en2zh-google-LABEL.txt
    ┝━━ CAT-zh2en-google-LABEL.txt
    ┝━━ CIT-en2zh-google-LABEL.txt
    ┕━━ ...

Scripts

We provide scripts for reproducing our experiment results in the /scripts folder.

Structure of /scripts folder:

scripts
┝━━ awesome_align/
┝━━ model_without_co/
┝━━ utils/
┝━━ word2vec/
┝━━ Limitation-statistics.py
┝━━ Motivation.py
┝━━ RQ1-en2zh.py
┝━━ RQ1-zh2en.py
┝━━ RQ2-en2zh.py
┝━━ RQ2-zh2en.py
┝━━ RQ3.py
┝━━ RQ4-en2zh.py
┝━━ RQ4-zh2en.py
┝━━ RQ5-en2zh.py
┝━━ RQ5-zh2en.py
┝━━ en.py
┕━━ zh.py

Usage Instruction

Please follow the steps below to reproduce all the experimental results in our paper.

Limitation of existing works:

  • To count the number of FPs/FNs due to the limitations in existing methods, please run:
    cd scripts
    python Limitation-statistics.py
    The statistic results will be printed and recorded into the result.txt file in the /scripts/Limitation folder.

Motivation Examples:

  • To run our word closure-based comparison method on the five motivation examples in the paper, please execute:
    python Motivation.py
    The detailed execution results of our word closure-based comparison method will be printed.

RQ1: The overall effectiveness of our approach in identifying violations.

  • To replicate the experiment results of RQ1, please run:
    python RQ1-en2zh.py
    python RQ1-zh2en.py
    The evaluation results will be printed and recorded into the result_en2zh.txt and result_zh2en.txt files in the /scripts/RQ1 folder.

RQ2: Compare with the naive comparison strategies based on native grammar units.

  • To replicate the experiment results of RQ2, please run:
    python RQ2-en2zh.py
    python RQ2-zh2en.py
    The comaprison results will be printed and recorded into the result_en2zh.txt and result_zh2en.txt files in the /scripts/RQ2 folder.

RQ3: The effectiveness of our approach in locating fine-grained violations.

  • To replicate the experiment results of RQ3, please run:
    python RQ3.py
    The evaluation results will be printed and recorded into the result_en2zh.txt and result_zh2en.txt files in the /scripts/RQ3 folder.

RQ4: The efficiency of our approach in identifying violations.

  • To replicate the experiment results of RQ4, please run:
    python RQ4-en2zh.py
    python RQ4-zh2en.py
    The evaluation results will be printed.

RQ5: Configuration selection.

  • To replicate the experiment results of RQ5, please run:
    python RQ5-en2zh.py
    python RQ5-zh2en.py
    The evaluation results of our method with its highest F1 score under different configurations will be printed and recorded into the result_en2zh.txt and result_zh2en.txt files in the /scripts/RQ5 folder.

Contact

If you have questions, suggestions and bug reports, please email imjinshuo@whu.edu.cn.

About

Artifact for the paper "Word Closure-Based Metamorphic Testing for Machine Translation".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages