This is the repository of the implementation in the paper.
Conda environment is recommended for running the code. To create the environment, run the following command:
conda env create -f environment.ymlTo activate the environment, run the following command:
conda activate chimeDownload SciSpacy model:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_sm-0.5.4.tar.gzAdd api keys to the environment variable:
export CLAUDE_API_KEY="YOUR KEY"
export OPENAI_API_KEY="YOUR KEY"THe hierarchy generation pipeline is implemented in chime/src/hierarchical_category_construction.py.
Turn off DEBUG flag in
chime/src/hierarchical_category_construction.pyto run the pipeline on the entire dataset.
cd chime/src
python hierarchical_category_construction.pyThe fine-tuning and LLM prediction is implemented in chime/src/flanT5 and chime/src/llm_prediction respectively.
Finetuned model and dataset are available on theHugging Face hub. You can find the datasets as follows:
joe32140/chime-parent-child-relationfor the parent-child relation dataset: link.joe32140/chime-sibling-coherencefor the sibling coherence dataset: link.joe32140/chime-claim-categoryfor the claim and category relevance dataset: link.
The finetuned model for claim and category relevance prediction joe32140/flan-t5-large-claim-category: link.
We also provide the model prediction on hierachy without human annotion for the claim and category prediction. You can find the prediction here.
See chime/src/parse_generated_hierarchy.py parse the generated hierarchy into structured format. Note that 2 out of 474 cannot be parsed due to the format of the generated hierarchy which results in total of 472 hierarchies in the paper.
- Paper
resources/raw_generated_hierarchy.csvcontains the raw generated hierarchies from the Claude-2.resources/raw_source_data.csvcontains the raw review and studies data from the Cochrane Library. The generated claims are also included in this file.
If you use this code or dataset, please cite the following:
@inproceedings{hsu-etal-2024-chime,
title = "{CHIME}: {LLM}-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support",
author = "Hsu, Chao-Chun and
Bransom, Erin and
Sparks, Jenna and
Kuehl, Bailey and
Tan, Chenhao and
Wadden, David and
Wang, Lucy and
Naik, Aakanksha",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.8",
pages = "118--132",
}