CompAct: Compressing Retrieved Documents Actively for Question Answering (EMNLP 2024)

We propose CompAct (Compressing Retrieved Documents Actively for Question Answering), a novel framework that employs an active strategy for compressing extensive documents. CompAct dynamically preserves query-related contexts, focusing on the integration of information across documents. See our paper for more details.

📃 Paper | 🤗 Model

Updates

[July 16. 2024] We have released the code and data.

Installation

To ensure compatibility with other libraries, we recommend using the following versions. You can adjust these based on your environment:

Python 3.10.9
PyTorch 2.1.2
Cuda 11.8

We utilize the 'alignment-handbook' for training our model. Please install all required packages as per their instructions. More details can be found here.

# Create and activate a new environment
conda create -n compact python==3.10.9
conda activate compact

# Install pytorch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

# alignment-handbook
cd ./alignment-handbook/
python -m pip install .
python -m pip install flash-attn --no-build-isolation # Flash Attention 2

# Install our requirements
pip install -r requirements.txt

Quick Usage

Here's a simple example to get you started:

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from utils import create_prompt, parse_output_without_sentence

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_dir = 'cwyoon99/CompAct-7b'
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_dir)

example = json.load(open('./data/example.json')) # example case with retrieved documents
print(f"question: {example['question']}\nanswer: {example['answer']}")
prev_summary = []
prev_eval = []

# actively compress documents until it finds all necessary evidence
for i, iteration in enumerate(example['iterations']):
    segment = iteration['documents_list']
    document_input = "\n".join(segment)

    # load previous output
    prev_summary_temp = prev_summary[-1] if i != 0 else ""
    prev_eval_temp = prev_eval[-1].replace('[INCOMPLETE]', '').strip() if i != 0 else ""

    # create prompt
    input_prompt = create_prompt(example, iteration, i, document_input, prev_summary_temp, prev_eval_temp, tokenizer, eos_token="", add_generation_prompt=True)
    
    # compress
    with torch.no_grad():
        inputs = tokenizer(input_prompt, return_tensors="pt")
        input_ids = inputs.input_ids.to(device)
        attention_mask = inputs.attention_mask.to(device)
        outputs = model.generate(input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=500, temperature=0, top_p=1.0, pad_token_id=tokenizer.eos_token_id)
    iteration['output'] = tokenizer.decode(outputs[0][input_ids.size(1):], skip_special_tokens=True).strip()

    # parsing
    parsed_sections = parse_output_without_sentence(iteration['output'])
    prev_summary.append(parsed_sections['summary'])
    prev_eval.append(parsed_sections['eval'])
    # compressing extensive documents into compact context (under 200 tokens)
    print(f"summary of segment {i}: {iteration['summary']}\ntermination of segment {i}: {iteration['eval']}\n")

    # early termination
    if "[COMPLETE]" in iteration['eval']:
        break

Download

Model

You can download our model from huggingface.

Data

We conducted experiments on 5 question-answering benchmark datasets: HotpotQA, MuSiQue, 2wikimultihopQA (2WikiMQA), Natural Question (NQ), and TriviaQA (TQA).

Required data can be downloaded from this Google Drive. Place it in the .data folder.

retrieval: instances with retrieved results for each datasets using different retrievers.
preprocessed: 28k preprocessed instances from HotpotQA train set
demos: Few-shot examples for answering questions.

Inference

After setting up your environment and preparing the data, you can compress retrieved documents and check the end QA performance with the following script:

For convenience, you can easliy deploy our model from Huggingface. If you wish to fine-tune the base model, Please refer to the Training section.

We also support batch decoding options (--batch_decoding, --batch_size) to accelerate the infernce

CUDA_VISIBLE_DEVICES=0

PRE_DIR="[your repository path]"

ret=contriever-msmarco

comp_name=cwyoon99/CompAct-7b

model_name=meta-llama/Meta-Llama-3-8B
cache_dir="[your caching paths]"

task=HotpotQA # 2wikimultihop, musique, NQ, TQA

if [ "$task" == "TQA" ]; then
    split="test"
else
    split="dev"
fi


iter=6
segment_size=5

CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES python run_prompt.py \
    --task $task \
    --data_path $PRE_DIR/data/retrieval/$ret"_"$task/$split.json \
    --fshot \
    --fshot_path $PRE_DIR/data/demos/fshot_$task.json \
    --compress_output_dir $PRE_DIR/data/experiments/compress/$ret"_"$task/$split \
    --read_output_dir $PRE_DIR/data/experiments/test/$ret"_"$task/$split \
    --compressor_name_or_path $comp_name \
    --model_name_or_path $model_name \
    --cache_dir $cache_dir \
    --batch_decoding \
    --batch_size 20 \
    --read_wo_prev_eval \
    --segment_size $segment_size \
    --max_iteration $iter \

If you want to use your self-trained model, specify the following arguments:
--compressor_dir e.g. $PRE_DIR/data/experiments/train/
--compressor_name_or_path e.g. "[name of trained model]"
--checkpoint e.g. checkpoint-378

Training

We apply Supervised Fine-Tuning (SFT) using only the subset of HotpotQA. You may change specific hyperparameters and training arguments in ./alignment-handbook/recipes/mistral-7b-instruct-v0.2/sft/config_full.yaml

For more information about data and training details, Please refer to our paper.

cd CompAct/alignment-handbook

export CUDA_VISIBLE_DEVICES="[GPU_ID]" # e.g. 0,1,2,3
export n_processes="[N_GPU]" # e.g. 4

# ./scripts/run_sft.sh
CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes $n_processes scripts/run_sft.py recipes/mistral-7b-instruct-v0.2/sft/config_full.yaml

Citation

@article{yoon2024compact,
      title={CompAct: Compressing Retrieved Documents Actively for Question Answering}, 
      author={Chanwoong Yoon and Taewhoo Lee and Hyeon Hwang and Minbyul Jeong and Jaewoo Kang},
      journal={arXiv preprint arXiv:2407.09014},
      year={2024},
      url={https://arxiv.org/abs/2407.09014}, 
}

Contact

For more information or any questions of our work, feel free to contact me (cwyoon99 (at) korea.ac.kr or gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
alignment-handbook		alignment-handbook
assets		assets
data		data
scripts		scripts
README.md		README.md
api_config.py		api_config.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_prompt.py		run_prompt.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompAct: Compressing Retrieved Documents Actively for Question Answering (EMNLP 2024)

Updates

Installation

Quick Usage

Download

Model

Data

Inference

Training

Citation

Contact

About

Releases

Packages

Languages

dmis-lab/CompAct

Folders and files

Latest commit

History

Repository files navigation

CompAct: Compressing Retrieved Documents Actively for Question Answering (EMNLP 2024)

Updates

Installation

Quick Usage

Download

Model

Data

Inference

Training

Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages