1. Overview

KLCF is a novel reinforcement learning framework that mitigates hallucinations in long-form generation by explicitly aligning a policy model's expressed knowledge with its parametric knowledge through Dual-Fact Alignment. It jointly optimizes factual recall and precision without relying on external knowledge sources, enabling efficient and scalable training.

2. Usage

2.1 Environment Setup

# Install main dependencies
pip install -r requirements.txt

# Install verl environment
cd verl && pip install -e .

2.2 Model Preparation

Base Models (Download from Hugging Face):

Qwen2.5-7B/14B/32B
DeepSeek-R1-Distill-Qwen-7B/14B/32B
Skywork-Reward-V2-Llama-3.2-1B

RL-Related Models (Deploy with vLLM/sglang):

Download from [🤗 Models].

KLCF-Qwen2.5-14B-Claim-Extractor
KLCF-Qwen2.5-14B-Checklist-Verifier
KLCF-Qwen2.5-7B-Truthfulness-Verifier
KLCF-Qwen2.5-14B-Truthfulness-Verifier
KLCF-Qwen2.5-32B-Truthfulness-Verifier

Deploy each model with:

python -m vllm.entrypoints.openai.api_server \
    --model [MODEL_NAME] \
    --host 0.0.0.0 \
    --port [PORT] \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --max-num-seqs 1024

python -m sglang.launch_server \
    --model-path Skywork-Reward-V2-Llama-3.2-1B \
    --mem-fraction-static 0.9 \
    --tp 4 \
    --host 0.0.0.0 \
    --port [PORT] \
    --context-length 16384 \
    --is-embedding \

2.3 Training

Configure environment variables in scripts:

export CLAIM_EXTRACTOR_SERVER=127.0.0.1:8000
export CLAIM_EXTRACTOR_PATH=KLCF-Qwen2.5-14B-Claim-Extractor

export CHECKLIST_RM_SERVER=127.0.0.1:8001
export CHECKLIST_RM_PATH=KLCF-Qwen2.5-14B-Checklist-Verifier

export TRUTHFULNESS_RM_SERVER=127.0.0.1:8002
export TRUTHFULNESS_RM_PATH=KLCF-Qwen2.5-14B-Truthfulness-Verifier

export GENERAL_RM_SERVER=127.0.0.1:8003
export GENERAL_RM_PATH=Skywork-Reward-V2-Llama-3.2-1B

KLCF-zero (training from base model):

cd verl
bash run_klcf_zero_14b.sh  # or run_klcf_zero_7b.sh / run_klcf_zero_32b.sh

KLCF (training from SFT model):

cd verl  
bash run_klcf_14b.sh  # or run_klcf_7b.sh / run_klcf_32b.sh

2.4 Evaluation

FActScore Evaluation:

cd eval
bash run_factscore.sh [model_id] [training_type]
# training_type: base, zero-rl, non-thinking, sft-rl

HalluLens-LongWiki Evaluation:

cd eval
bash run_longwiki.sh [model_id] [training_type]

VeriScore Evaluation (LongFact/Factory):

cd eval
bash run_veriscore.sh [dataset] [model_id] [training_type]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonl

WinRate Evaluation (LongFact/Factory):

cd eval/WR
bash run.sh [dataset] [base_model_id] [model_id]
# dataset: longfact_objects_testset.jsonl or factory_testset.jsonl

Notes

Ensure all model servers are running before training
Adjust ports and paths according to your deployment
Training types must match the model's training procedure for accurate evaluation

Cite

If you find this project useful in your research or work, please consider citing it:

@article{li2025knowledge,
  title={Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality},
  author={Li, Junliang and Wang, Yucheng and Chen, Yan and Ran, Yu and Zhang, Ruiqing and Liu, Jing and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2509.23765},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/rl_data		data/rl_data
eval		eval
plt		plt
verl		verl
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Overview

2. Usage

2.1 Environment Setup

2.2 Model Preparation

2.3 Training

2.4 Evaluation

Notes

Cite

About

Uh oh!

Releases

Packages

Languages

ki-ljl/KLCF

Folders and files

Latest commit

History

Repository files navigation

1. Overview

2. Usage

2.1 Environment Setup

2.2 Model Preparation

2.3 Training

2.4 Evaluation

Notes

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages