- 🌻Acknowledgement
- 🌟Overview
- 🔧Installation
- 📚Knowledge Base Construction
- 📉Training
- 🧐Evaluation
- 🚩Citation
Our Cold-Start SFT stage is implemented based on the excellent LLaMA-Factory framework. Our reinforcement learning training code is based on TRL and Unsloth. We thank all authors for their great contributions!

Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucinations due to an inability to accurately recognize their knowledge boundaries. To address this, we propose KnowRL, a novel framework that integrates external knowledge into the reinforcement learning process. KnowRL guides models to perform fact-based slow thinking by incorporating a factuality reward directly into the RL training loop. KnowRL can be seen as leveraging a form of test-time scaling law to reduce hallucinations. This helps models learn their knowledge boundaries and fosters a more reliable, fact-based reasoning process, effectively mitigating hallucinations while maintaining or enhancing strong reasoning capabilities.
We recommend creating a new conda environment to run our project.
conda create -n knowrl python=3.12
conda activate knowrl
git clone https://github.com/zjunlp/KnowRL.git
cd KnowRL
pip install -r requirements.txtKnowRL's factuality reward relies on an external knowledge base. You can either download our pre-built version or build it from your own corpus.
This is the easiest way to get started. We have hosted the pre-built knowledge_base.db file on Google Drive.
# The target directory for the knowledge base
cd train/reward_function/FActScore/build_knowledge/
# Download the file from Google Drive and name it knowledge_base.db
gdown https://drive.google.com/uc?id=1EVFkzuFvqE8AOEcdfSSm03vvvbVDa7bIThis command will download the database directly into the required folder.
If you wish to build the knowledge base from your own data source (e.g., a specific Wikipedia dump).
-
Place your source data file (e.g.,
wikipedia.jsonl) in a directory. -
Edit the
build_db.shscript to pointDATA_PATHto your data file. -
Run the script from the
build_knowledgedirectory to create the SQLite database.cd train/reward_function/FActScore/build_knowledge/ # Edit DATA_PATH in build_db.sh to point to your source file bash build_db.sh
This will create the knowledge_base.db file required for the fact_reward function during training.
We utilize a Knowledgeable Reinforcement Learning (RL) phase to enhance the model's factuality. Also, our datasets and models have been uploaded to huggingface.
This stage uses the SFT-tuned model and further trains it with our knowledge-enhanced reward signal. The process is orchestrated by train/train.sh, which launches main.py using the configuration defined in script/grpo.yaml. We are training two 7B models, DeepSeek-R1-Distill-Qwen-7B and Skywork-OR1-7B-Preview, on 1×A800 GPU.
a. Environment Variables in train/train.sh:
This script sets up all necessary environment variables and executes the training.
- Set your API keys for services like OpenAI (
OPENAI_API_KEY_FACTSCORE,OPENAI_API_KEY_JUDGE). - Set your
SWANLAB_API_KEYfor experiment tracking. - Ensure
FACTSCORE_DB_PATHpoints to theknowledge_base.dbfile you created.
b. Training Parameters in script/grpo.yaml
This file contains all hyperparameters for the RL stage.
model_name_or_path: Path to the base model for RL training (this should be your SFT-tuned model).dataset_id_or_path: Path to your RL training data.output_dir: Directory to save the final trained model.swanlab_project,swanlab_experiment_name: SwanLab configuration.per_device_train_batch_size,learning_rate,max_steps: Standard training hyperparameters.beta,num_generations: GRPO-specific algorithm parameters.
c. Launch RL Training
Once configured, launch the training from the train directory:
cd KnowRL/train/
bash train.shThe script will set the CUDA_VISIBLE_DEVICES, print the configuration, and start the training process.
Click to view train.sh
#!/bin/bash
# ============================================================================
# API Configuration - Replace with your actual credentials
# ============================================================================
export OPENAI_API_KEY_FACTSCORE="your_openai_api_key_here"
export OPENAI_BASE_URL_FACTSCORE="https://api.openai.com/v1"
export OPENAI_API_KEY_JUDGE="your_openai_api_key_here"
export OPENAI_API_BASE_JUDGE="https://api.openai.com/v1"
export SWANLAB_API_KEY="your_swanlab_api_key_here"
# ============================================================================
# Configuration
# ============================================================================
export FACTSCORE_DB_PATH="./FActScore/build_knowledge/knowledge_base.db"
export USE_API_MANAGER_FOR_LLM_EVAL=True
export USE_API_MANAGER_FOR_FACTSCORE=True
# Set GPU device
export CUDA_VISIBLE_DEVICES=0
# Configuration file
CONFIG_FILE="./script/grpo.yaml"
# ============================================================================
# Run Training
# ============================================================================
echo "Starting GRPO training..."
echo "Config: $CONFIG_FILE"
echo "GPU: $CUDA_VISIBLE_DEVICES"
python main.py --config "$CONFIG_FILE"
if [ $? -eq 0 ]; then
echo "✅ Training completed successfully!"
else
echo "❌ Training failed!"
exit 1
fiIf you find this work useful in your research, please consider citing our paper:
@article{ren2025knowrl,
title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2506.19807},
year={2025}
}