Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction
This is the official repository for our paper:
@misc{goto2024improvingexplainabilitysentencelevelmetrics,
title={Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction},
author={Takumi Goto and Justin Vasselli and Taro Watanabe},
year={2024},
eprint={2412.13110},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13110},
}
pip install git+https://github.com/naist-nlp/gec-attribute
python -m spacy download en_core_web_sm
Or
git clone https://github.com/naist-nlp/gec-attribute
cd gec-attribute
pip install -e ./
python -m spacy download en_core_web_sm
- We have attribution classes and metric classes. First, create an instance of the metric, then create an instance of the attribution method using it as an argument.
- Use
get_metric()
to get a metric class. - Use
get_method()
to get an attribution method class.
- Use
- You can use
.attribute()
to run an attribution and then get agec_attribution.method.AttributionBase.AttributionOutput
object.
from gec_attribute import get_metric, get_method
from gec_attribute.methods import AttributionBase
import pprint
metric_cls = get_metric('impara')
metric = metric_cls(metric_cls.Config())
method_cls = get_method('shapley')
attributor = method_cls(method_cls.Config(
metric=metric
))
# This is the example of Table 1 in our paper.
src = 'Further more by these evidence u will agree'
hyp = 'Further more , with this evidence , you will agree .'
output = attributor.attribute(src=src, hyp=hyp)
assert isinstance(output, AttributionBase.AttributionOutput)
pprint.pprint(output)
# Output:
# AttributionOutput(sent_score=-0.027204984799027443,
# src_score=0.027204984799027443,
# attribution_scores=[0.06834496899197498,
# 0.02928952643026908,
# 0.12393252272158858,
# 0.14501388886322578,
# -0.36118950191885235,
# -0.03259630793084701],
# edits=[<errant.edit.Edit object at 0x7fb3dc599890>,
# <errant.edit.Edit object at 0x7fb2f8339d90>,
# <errant.edit.Edit object at 0x7fb3dc5cf610>,
# <errant.edit.Edit object at 0x7fb2dc0ac9d0>,
# <errant.edit.Edit object at 0x7fb2dc0ac990>,
# <errant.edit.Edit object at 0x7fb2dc0ac890>],
# src='Further more by these evidence u will agree')
You can see the ids for get_metric()
via get_metric_ids()
.:
from gec_attribute import get_metric_ids
print(get_metric_ids())
Currently these keys are available.
some
: SOME [Yoshimura+ 20]. Note that you need to download pre-trained models in advance from here.impara
: IMPARA [Maeda+ 22].ppl
: PPL.
You can see the ids for get_method()
via get_method_ids()
.:
from gec_attribute import get_method_ids
print(get_method_ids())
Currently these keys are available.
add
: Add, one of the baselines.sub
: Sub, one of the baselines.shapley
: Shapley values, the proposed method.shapleysampling
: Shapley sampling values.
The experimental scripts are available in experiments/.
Note that pip install git+...
does not install these scripts. You need to do git clone
instead.
The experiments/corrected/
directory contains the corrected sentences used in our experiments.
experiments/corrected/
├── conll14
│ ├── bart.txt
│ ├── gector-2024.txt
│ ├── gector-roberta.txt
│ ├── gpt-4o-mini.txt
│ └── t5.txt
└── jfleg-dev
├── bart.txt
├── gector-2024.txt
├── gector-roberta.txt
├── gpt-4o-mini.txt
└── t5.txt