🧪 FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees
| Fan Nie | Xiaotian Hou | Shuhang Lin | James Zou | Huaxiu Yao | Linjun Zhang | 
- 🎉 May 27, 2025: Source code released!
 - 🎉 May 28, 2025: Upload all four datasets!
 
This repository provides tools for testing factuality in Large Language Models with statistical guarantees. Follow the steps below to get started with ParaRel as an example.
# Clone the repository
git clone https://github.com/fannie1208/FactTest.git
cd FactTest
pip install -r requirements.txtNavigate to the calibration directory:
cd calibration/pararelpython collect_dataset.py --model openlm-research/open_llama_3bFor Vanilla Entropy Score Function:
Initial calibration run (computes and saves scores):
python calculate_vanilla_threshold.py \
    --model openlm-research/open_llama_3b \
    --alpha 0.05 \
    --num_try 15Reusing Saved Scores:
After the initial run, you can quickly calculate thresholds for different alpha values using the stored scores:
python calculate_vanilla_threshold.py \
    --model openlm-research/open_llama_3b \
    --alpha 0.1 \
    --stored \
    --num_try 15The --stored flag allows you to experiment with different significance levels without re-running the expensive model evaluation.
cd evaluation/pararel
python evaluate_vanilla.py \
    --model openlm-research/open_llama_3b \
    --num_try 15After evaluation, compute the metrics using:
python eval.py \
    --model openlm-research/open_llama_3b \
    --num_try 15 \
    --method vanilla \
    --tau <your_threshold>💡 Note: Replace
<your_threshold>with the threshold value obtained from the calibration step.
If you find this work useful, please cite our paper:
@misc{nie2024facttest,
      title={FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees}, 
      author={Fan Nie and Xiaotian Hou and Shuhang Lin and James Zou and Huaxiu Yao and Linjun Zhang},
      year={2024},
      eprint={2411.02603},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.02603}, 
}