Skip to content
/ KLCF Public

Pytorch Code for---Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality.

Notifications You must be signed in to change notification settings

ki-ljl/KLCF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Overview

[🧾 Paper] | [💻 Code] | [🤗 Models]

KLCF is a novel reinforcement learning framework that mitigates hallucinations in long-form generation by explicitly aligning a policy model's expressed knowledge with its parametric knowledge through Dual-Fact Alignment. It jointly optimizes factual recall and precision without relying on external knowledge sources, enabling efficient and scalable training.

2. Usage

2.1 Environment Setup

# Install main dependencies
pip install -r requirements.txt

# Install verl environment
cd verl && pip install -e .

2.2 Model Preparation

Base Models (Download from Hugging Face):

  • Qwen2.5-7B/14B/32B
  • DeepSeek-R1-Distill-Qwen-7B/14B/32B
  • Skywork-Reward-V2-Llama-3.2-1B

RL-Related Models (Deploy with vLLM/sglang):

Download from [🤗 Models].

  • KLCF-Qwen2.5-14B-Claim-Extractor
  • KLCF-Qwen2.5-14B-Checklist-Verifier
  • KLCF-Qwen2.5-7B-Truthfulness-Verifier
  • KLCF-Qwen2.5-14B-Truthfulness-Verifier
  • KLCF-Qwen2.5-32B-Truthfulness-Verifier

Deploy each model with:

python -m vllm.entrypoints.openai.api_server \
    --model [MODEL_NAME] \
    --host 0.0.0.0 \
    --port [PORT] \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --max-num-seqs 1024
python -m sglang.launch_server \
    --model-path Skywork-Reward-V2-Llama-3.2-1B \
    --mem-fraction-static 0.9 \
    --tp 4 \
    --host 0.0.0.0 \
    --port [PORT] \
    --context-length 16384 \
    --is-embedding \

2.3 Training

Configure environment variables in scripts:

export CLAIM_EXTRACTOR_SERVER=127.0.0.1:8000
export CLAIM_EXTRACTOR_PATH=KLCF-Qwen2.5-14B-Claim-Extractor

export CHECKLIST_RM_SERVER=127.0.0.1:8001
export CHECKLIST_RM_PATH=KLCF-Qwen2.5-14B-Checklist-Verifier

export TRUTHFULNESS_RM_SERVER=127.0.0.1:8002
export TRUTHFULNESS_RM_PATH=KLCF-Qwen2.5-14B-Truthfulness-Verifier

export GENERAL_RM_SERVER=127.0.0.1:8003
export GENERAL_RM_PATH=Skywork-Reward-V2-Llama-3.2-1B

KLCF-zero (training from base model):

cd verl
bash run_klcf_zero_14b.sh  # or run_klcf_zero_7b.sh / run_klcf_zero_32b.sh

KLCF (training from SFT model):

cd verl  
bash run_klcf_14b.sh  # or run_klcf_7b.sh / run_klcf_32b.sh

2.4 Evaluation

FActScore Evaluation:

cd eval
bash run_factscore.sh [model_id] [training_type]
# training_type: base, zero-rl, non-thinking, sft-rl

HalluLens-LongWiki Evaluation:

cd eval
bash run_longwiki.sh [model_id] [training_type]

VeriScore Evaluation (LongFact/Factory):

cd eval
bash run_veriscore.sh [dataset] [model_id] [training_type]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonl

WinRate Evaluation (LongFact/Factory):

cd eval/WR
bash run.sh [dataset] [base_model_id] [model_id]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonl

Notes

  • Ensure all model servers are running before training
  • Adjust ports and paths according to your deployment
  • Training types must match the model's training procedure for accurate evaluation

Cite

If you find this project useful in your research or work, please consider citing it:

@article{li2025knowledge,
  title={Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality},
  author={Li, Junliang and Wang, Yucheng and Chen, Yan and Ran, Yu and Zhang, Ruiqing and Liu, Jing and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2509.23765},
  year={2025}
}

About

Pytorch Code for---Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages