This repository contains the code for the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations".
FactGraph is an adapter-based method for assessing factuality that decomposes the document and the summary into structured meaning representations (MR):
In FactGraph, summary and document graphs are encoded by a graph encoder with structure-aware adapters, along with text representations using an adapter-based text encoder. Text and graph encoders use the same pretrained model and only the adapters are trained:
The easiest way to proceed is to create a conda environment:
conda create -n factgraph python=3.7
conda activate factgraph
Further, install PyTorch and PyTorch Geometric:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-geometric==2.0.3
Install the packages required:
pip install -r requirements.txt
Finally, create the environment for AMR preprocessing:
cd data/preprocess
./create_envs_preprocess.sh
cd ../../
FactCollect is created consolidating the following datasets:
Dataset | Datapoints | |
---|---|---|
Wang et al. (2020) | 953 | Link |
Kryscinski et al. (2020) | 1434 | Link |
Maynez et al. (2020) | 2500 | Link |
Pagnoni et al. (2021) | 4942 | Link |
For generating FactCollect dataset, execute:
conda activate factgraph
cd data
./create_dataset.sh
cd ..
Convert the dataset into the format required for the model:
cd data/preprocess
./process_dataset_for_model.sh <gpu_id>
cd ../../
This step generated AMR graphs using the SPRING model. Check their repository for more details.
For training FactGraph using the FactCollect dataset, execute:
conda activate factgraph
./train.sh <gpu_id>
For predicting, run:
./predict.sh <checkpoint_folder> <gpu_id>
Download the files train.tsv and test.tsv from this link provided by Goyal and Durrett (2021). Copy those files to data\edge_level_data
Convert the dataset into the format required for the model:
cd data/preprocess
./process_dataset_for_edge_model.sh <gpu_id>
cd ../../
For training FactGraph using the FactCollect dataset, execute:
conda activate factgraph
./train_edgelevel.sh <gpu_id>
For predicting, run:
./predict_edgelevel.sh <checkpoint_folder> <gpu_id>
See CONTRIBUTING for more information.
The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.
The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.
@inproceedings{ribeiro-etal-2022-factgraph,
title = "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations",
author = "Ribeiro, Leonardo F. R. and
Liu, Mengwen and
Gurevych, Iryna and
Dreyer Markus and
Bansal, Mohit",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
year={2022}
}