[🧾 Paper] | [💻 Code] | [🤗 Models]
KLCF is a novel reinforcement learning framework that mitigates hallucinations in long-form generation by explicitly aligning a policy model's expressed knowledge with its parametric knowledge through Dual-Fact Alignment. It jointly optimizes factual recall and precision without relying on external knowledge sources, enabling efficient and scalable training.
# Install main dependencies
pip install -r requirements.txt
# Install verl environment
cd verl && pip install -e .Base Models (Download from Hugging Face):
- Qwen2.5-7B/14B/32B
- DeepSeek-R1-Distill-Qwen-7B/14B/32B
- Skywork-Reward-V2-Llama-3.2-1B
RL-Related Models (Deploy with vLLM/sglang):
Download from [🤗 Models].
- KLCF-Qwen2.5-14B-Claim-Extractor
- KLCF-Qwen2.5-14B-Checklist-Verifier
- KLCF-Qwen2.5-7B-Truthfulness-Verifier
- KLCF-Qwen2.5-14B-Truthfulness-Verifier
- KLCF-Qwen2.5-32B-Truthfulness-Verifier
Deploy each model with:
python -m vllm.entrypoints.openai.api_server \
--model [MODEL_NAME] \
--host 0.0.0.0 \
--port [PORT] \
--tensor-parallel-size 4 \
--trust-remote-code \
--gpu-memory-utilization 0.9 \
--max-num-seqs 1024python -m sglang.launch_server \
--model-path Skywork-Reward-V2-Llama-3.2-1B \
--mem-fraction-static 0.9 \
--tp 4 \
--host 0.0.0.0 \
--port [PORT] \
--context-length 16384 \
--is-embedding \Configure environment variables in scripts:
export CLAIM_EXTRACTOR_SERVER=127.0.0.1:8000
export CLAIM_EXTRACTOR_PATH=KLCF-Qwen2.5-14B-Claim-Extractor
export CHECKLIST_RM_SERVER=127.0.0.1:8001
export CHECKLIST_RM_PATH=KLCF-Qwen2.5-14B-Checklist-Verifier
export TRUTHFULNESS_RM_SERVER=127.0.0.1:8002
export TRUTHFULNESS_RM_PATH=KLCF-Qwen2.5-14B-Truthfulness-Verifier
export GENERAL_RM_SERVER=127.0.0.1:8003
export GENERAL_RM_PATH=Skywork-Reward-V2-Llama-3.2-1BKLCF-zero (training from base model):
cd verl
bash run_klcf_zero_14b.sh # or run_klcf_zero_7b.sh / run_klcf_zero_32b.shKLCF (training from SFT model):
cd verl
bash run_klcf_14b.sh # or run_klcf_7b.sh / run_klcf_32b.shFActScore Evaluation:
cd eval
bash run_factscore.sh [model_id] [training_type]
# training_type: base, zero-rl, non-thinking, sft-rlHalluLens-LongWiki Evaluation:
cd eval
bash run_longwiki.sh [model_id] [training_type]VeriScore Evaluation (LongFact/Factory):
cd eval
bash run_veriscore.sh [dataset] [model_id] [training_type]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonlWinRate Evaluation (LongFact/Factory):
cd eval/WR
bash run.sh [dataset] [base_model_id] [model_id]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonl- Ensure all model servers are running before training
- Adjust ports and paths according to your deployment
- Training types must match the model's training procedure for accurate evaluation
If you find this project useful in your research or work, please consider citing it:
@article{li2025knowledge,
title={Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality},
author={Li, Junliang and Wang, Yucheng and Chen, Yan and Ran, Yu and Zhang, Ruiqing and Liu, Jing and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2509.23765},
year={2025}
}