This repository contains data for the paper Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization
@inproceedings{temposum23,
title="{Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization}",
author = {Chi Seng Cheang and
Hou Pong Chan and
Derek F. Wong and
Xuebo Liu and
Zhaocong Li and
Yanming Sun and
Shudong Liu and
Lidia S. Chao},
booktitle = "Proceedings of the Conference on Empirical Methods in Natural Language Processing ({EMNLP})",
month = {December},
year = "2023",
publisher = "Association for Computational Linguistics",
}
Run the following commands to to load the datasets from Huggingface Datasets Library.
import datasets
# BBC in-distribution test set
dataset = datasets.load_dataset('chiseng-cheang/TempoSum', 'BBC_in-distribution')
# BBC future test set
dataset = datasets.load_dataset('chiseng-cheang/TempoSum', 'BBC_future')
# CNN in-distribution test set
dataset = datasets.load_dataset('chiseng-cheang/TempoSum', 'CNN_in-distribution')
# CNN future test set
dataset = datasets.load_dataset('chiseng-cheang/TempoSum', 'CNN_future')
All datasets are also available at: https://drive.google.com/drive/folders/1BdeTFqoea8GD240h78PgXBO68e53ea9E?usp=sharing