Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

🎯 Who Should Pay Attention to Our Work?

Researchers in the field of hallucination detection: The paper introduces a novel invocation discriminant metric, AttenHScore, for detecting hallucinations in SLMs.
Developers working with LLMs and SLMs: The proposed methods aim to balance performance and cost in collaborative LLM-SLM systems. The strategies presented eliminate the need for additional model training and are designed to be plug-and-play for various transformer-based LMs, which could be highly beneficial for practical applications.
Individuals working on QA systems and RAG: The paper specifically tests its methods on multiple QA datasets and within a RAG context. The re-ranking strategy based on uncertainty evaluation is designed to help SLMs better capture critical information and enhance accuracy in QA tasks.
Organizations looking to optimize resource allocation in AI systems: The collaborative paradigm of large and small LMs aims for optimal resource allocation and efficient task processing by leveraging the strengths of both types of models.

Quick Start

Install dependency packages

pip install -r requirements.txt

Start the milvus-lite service (vector database)

milvus-server

Download models to corresponding directories.
Modify _settings.py according to your need.
Overall results of the hallucination detection component.

nohup python -m pipeline.generate \
  --model Meta-Llama-3-8B-Instruct \
  --dataset coqa \
  --device 'cuda:3' \
  --num_generations_per_prompt 10 \
  --temperature 0.5 \
  --top_p 0.99 \
  --top_k 10 \
  --fraction_of_data_to_use 1 \
  --project_ind 0 \
  >> nohup_llama3_8B_coqa.log 2>&1 &

# Performance evaluation
python func/evalFunc.py

Collaborative performance of LLMs and SLMs in QA.

SLM Retrieval-Based QA System

CUDA_VISIBLE_DEVICES=7 nohup python retrieval.py \
  --data_path 'data/multifieldqa_zh.jsonl' \
  --save_file 'qa_nodie/multifieldqa_zh_qwen2_7B_Chunks.json' \
  --docs_path 'data/chunking/multifieldqa_zh_qwen2_7B_Chunks.json' \
  --collection_name 'multifieldqa_zh_qwen2_7B_Chunks \
  --retrieve_top_k 5 \
  --construct_index \
  >> qa_nodie/multifieldqa_zh_qwen2_7B_Chunks_top5.log 2>&1 &

LLM Interface QA System

nohup python tollms_retrieval.py >> qa_nodie/dataset_yiyan.log 2>&1 &

Large-Small LM Collaboration in QA System

CUDA_VISIBLE_DEVICES=1,2 nohup python judge.py >> qa_nodie/multifieldqa_zh_top10.log 2>&1 &

# Performance evaluation
python eval.py

Results

We first conduct a comprehensive evaluation of the key component for detecting hallucinations in SLMs within the collaborative system of large-small LM on hallucination benchmarks. Subsequently, we integrate AttenHScore into the entire system and evaluate its accuracy in determining interface calls by comparing various real-time hallucination detection methods.

Citation

@article{AttenHScore,
  title={Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering},
  author={Zhao, Jihao and Zhou, Chunlai and Li, DaiXuan and Zu, Shuaishuai and Qin, Biao},
  journal={arXiv preprint arXiv:2505.02311},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
dataeval		dataeval
embeddings		embeddings
func		func
images		images
models		models
pipeline		pipeline
utils		utils
LICENSE		LICENSE
README.md		README.md
_settings.py		_settings.py
base.py		base.py
eval.py		eval.py
judge.ipynb		judge.ipynb
judge.py		judge.py
metrics.py		metrics.py
perplexity_chunking.py		perplexity_chunking.py
requirements.txt		requirements.txt
rerank.py		rerank.py
retrieval.py		retrieval.py
run.sh		run.sh
tollms_retrieval.py		tollms_retrieval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

Quick Start

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Robot2050/AttenHScore

Folders and files

Latest commit

History

Repository files navigation

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

Quick Start

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages