iFᴀᴄᴇᴛSᴜᴍ is an interactive faceted summarization approach and system for navigating within a large document-set on a topic.
- Paper 📄 https://arxiv.org/pdf/2109.11621.pdf (Proceedings of EMNLP 2021, System Demonstrations)
- Demo 🤩 https://nlp.biu.ac.il/~hirsche5/ifacetsum/
First, git clone the project.
- Run
pip install -r requirements.txt
- Run
python -m spacy download en_core_web_md
- From inside python, run
import nltk
and thennltk.download('punkt')
- Run
python WebApp/server/app.py
- Run
cd WebApp/client
- Run
npm install
- Run
npm start
- Open the url
http://localhost:3000
You should request access for DUC2006Clean from https://duc.nist.gov/ and place it inside the data/
directory.
- Change
Config.py
to point to your data directory, including the text files and the cluster files (either json or conll format).
To support reproducibility efforts and adding custom document-sets, all models used were released and available online.
- Create event mentions using the models and scripts in https://github.com/ariecattan/event_extractor.
- Create pairwise mention scores and clusters using CDLM https://github.com/aviclu/CDLM.
- Use agglomerative clustering to combine mentions into clusters.
For the end-to-end iFᴀᴄᴇᴛSᴜᴍ entities script (following above instructions) refer to https://github.com/AlonEirew/wd-plus-srl-extraction#wec-cd-coreference
- Create entities mentions using SpanBert, accessible from https://docs.allennlp.org/models/main/.
- Use the WEC model to score each pairwise.
- Use agglomerative clustering to combine WD and CD mentions into clusters.
- Please refer to https://github.com/oriern/SuperPAL for instructions of extracting propositions using OIE and extracting pairwise scores.
- iFᴀᴄᴇᴛSᴜᴍ's code takes care of converting the pairwise CSV from SuperPAL into clusters.
If you find our work useful, please cite the paper as:
@article{hirsch2021ifacetsum,
title={iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration},
author={Hirsch, Eran and Eirew, Alon and Shapira, Ori and Caciularu, Avi and Cattan, Arie and Ernst, Ori and Pasunuru, Ramakanth and Ronen, Hadar and Bansal, Mohit and Dagan, Ido},
journal={Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
year={2021}
}