Skip to content

Latest commit

 

History

History
140 lines (135 loc) · 6.75 KB

Data.md

File metadata and controls

140 lines (135 loc) · 6.75 KB

Detailed Information about Our Dataset

Limitation-statistics:

Structure of /Limitation-statistics folder:

Limitation-statistics
┝━━ CAT-google-FN.csv
┝━━ CAT-google-FP.csv
┝━━ CIT-google-FN.csv
┝━━ CIT-google-FP.csv
┝━━ PatInv-google-FN.csv
┝━━ PatInv-google-FP.csv
┝━━ Purity-google-FN.csv
┝━━ Purity-google-FP.csv
┝━━ SIT-google-FN.csv
┕━━ SIT-google-FP.csv

In each of the above "[IT]-[SUT]-[FP,FN].csv" file, each row records an False Positive (FP) or False Negative (FN) of the baseline method, which contains the following items:

  • S_s: the source input sentence.
  • S_f: the follow-up input sentence.
  • T_s: the source output translation.
  • T_f: the source input sentence.
  • Limitation-A: whether this FP or FN is due to the Limitation-A, i.e., cannot compare two output translations at a fine granularity for precise comparison.
  • Limitation-B: whether this FN is due to the Limitation-A, i.e., lack the linkage between the fragment in the source output and its counterpart in the follow-up output to implement rigorous comparison.
  • Limitation-C: whether this FN is due to the Limitation-B, i.e., fail to detect the incorrect translations of the same input words.
  • Limitation-text: whether this FN is due to the Limitation of text-based comparison methods, i.e., fail to detect the incorrect translations that have the same structure as its counterpart.
  • Limitation-structure: whether this FP is due to the Limitation of structure-based comparison methods, i.e., cannot recognize the synonyms between the two translations.

Motivation-examples:

Structure of /Motivation-examples folder:

Motivation-examples
┝━━ CAT-en2zh-motivation.csv
┝━━ CIT-en2zh-motivation.csv
┝━━ PatInv-en2zh-motivation.csv
┝━━ Purity-en2zh-motivation.csv
┕━━ SIT-en2zh-motivation.csv

Each of the above "[IT]-[Language]-motivation.csv" file contains a motivation example for the corresponding Input Transformation (IT), which contains the following items:

  • S_s: the source input sentence.
  • S_f: the follow-up input sentence.
  • T_s: the source output translation.
  • T_f: the source input sentence.
  • Violation: whether this pair of test cases violate the output relation. 1 for violation, 0 for non-violation.
  • Fine-grained Violations in T_s: the tokens in T_s that lead to the violation.
  • Fine-grained Violations in T_f: the tokens in T_f that lead to the violation.

RQ1:

Structure of /RQ1 folder:

RQ1
┝━━ CAT-en2zh-google.csv
┝━━ CAT-en2zh-bing.csv
┝━━ CAT-en2zh-youdao.csv
┝━━ CIT-en2zh-google.csv
┝━━ CIT-en2zh-bing.csv
┝━━ CIT-en2zh-youdao.csv
┝━━ PatInv-en2zh-google.csv
┝━━ PatInv-en2zh-bing.csv
┝━━ PatInv-en2zh-youdao.csv
┝━━ Purity-en2zh-google.csv
┝━━ Purity-en2zh-bing.csv
┝━━ Purity-en2zh-youdao.csv
┝━━ SIT-en2zh-google.csv
┝━━ SIT-en2zh-bing.csv
┝━━ SIT-en2zh-youdao.csv
┝━━ CAT-zh2en-google.csv
┝━━ CAT-zh2en-bing.csv
┝━━ CAT-zh2en-youdao.csv
┝━━ CIT-zh2en-google.csv
┝━━ CIT-zh2en-bing.csv
┝━━ CIT-zh2en-youdao.csv
┝━━ PatInv-zh2en-google.csv
┝━━ PatInv-zh2en-bing.csv
┝━━ PatInv-zh2en-youdao.csv
┝━━ Purity-zh2en-google.csv
┝━━ Purity-zh2en-bing.csv
┝━━ Purity-zh2en-youdao.csv
┝━━ SIT-zh2en-google.csv
┝━━ SIT-zh2en-bing.csv
┕━━ SIT-zh2en-youdao.csv

Each of the above "[IT]-[Language]-[SUT].csv" file contains all the test case pairs of the Language setting (en2zh means English-to-Chiese, zh2en means Chinese-to-English) generated by IT for SUT, each of which contains the following items:

  • S_s: the source input sentence.
  • S_f: the follow-up input sentence.
  • T_s: the source output translation.
  • T_f: the source input sentence.
  • Violation: whether this pair of test cases violate the output relation. 1 for violation, 0 for non-violation.
  • Fine-grained Violations in T_s: the tokens in T_s that lead to the violation.
  • Fine-grained Violations in T_f: the tokens in T_f that lead to the violation.

RQ2&5.

Structure of /RQ2&5 folder:

RQ1
┝━━ CAT-en2zh-merge.csv
┝━━ CAT-zh2en-merge.csv
┝━━ CIT-en2zh-merge.csv
┝━━ CIT-zh2en-merge.csv
┝━━ PatInv-en2zh-merge.csv
┝━━ PatInv-zh2en-merge.csv
┝━━ Purity-en2zh-merge.csv
┝━━ Purity-zh2en-merge.csv
┝━━ SIT-en2zh-merge.csv
┕━━ SIT-zh2en-merge.csv

Each of the above "[IT]-[Language]-merge.csv" file contains all the test case pairs of the Language setting (en2zh means English-to-Chiese, zh2en means Chinese-to-English) generated by IT for the three SUTs (Google, Bing, and Youdao), each of which contains the following items:

  • S_s: the source input sentence.
  • S_f: the follow-up input sentence.
  • T_s: the source output translation.
  • T_f: the source input sentence.
  • Violation: whether this pair of test cases violate the output relation. 1 for violation, 0 for non-violation.
  • Fine-grained Violations in T_s: the tokens in T_s that lead to the violation.
  • Fine-grained Violations in T_f: the tokens in T_f that lead to the violation.

RQ3

Structure of /RQ3 folder:

RQ1
┝━━ CAT-en2zh-google-LABEL.txt
┝━━ CAT-zh2en-google-LABEL.txt
┝━━ CIT-en2zh-google-LABEL.txt
┝━━ CIT-zh2en-google-LABEL.txt
┝━━ Purity-en2zh-google-LABEL.txt
┕━━ Purity-zh2en-google-LABEL.txt

Each of the above "[IT]-[Language]-google-LABEL.txt" file contains the True Positives (TPs) of the Language setting (en2zh means English-to-Chiese, zh2en means Chinese-to-English) identified by the baselines and our method. In these files, each test is decomposed into 13 lines:

  • Line 1: the id of this test case pair.
  • Line 2: the source input sentence.
  • Line 3: the follow-up input sentence.
  • Line 4: the source output translation.
  • Line 5: the source input sentence.
  • Line 6: the tokens with token indexes of the source output translation.
  • Line 7: the tokens with token indexes of the follow-up output translation.
  • Line 8: the token indexes of the fine-grained violations in T_s located by the baseline methods.
  • Line 9: the token indexes of the fine-grained violations in T_f located by the baseline methods.
  • Line 10: the token indexes of the fine-grained violations in T_s located by our method.
  • Line 11: the token indexes of the fine-grained violations in T_f located by our method.
  • Line 12: the manually labeled token indexes of the true fine-grained violations in T_s.
  • Line 13: the manually labeled token indexes of the true fine-grained violations in T_f.