Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Overview

We introduce the Native Parallel Reasoner (NPR), a scalable framework for constructing models that intrinsically reason in parallelism. NPR learns adaptive decomposition and aggregation policies through a teacher-free pipeline combining self-distilled parallel Supervised Fine-Tuning (SFT) with Native Parallel Reinforcement Learning (RL). This approach allows the model to optimize its own branching strategies directly from experience within a shared computation graph, preserving its native reasoning style while maximizing exploration efficiency. Across eight diverse reasoning benchmarks, NPR achieves decisive gains: self-distilled data outperform prior teacher-generated corpora by 10.1%, and our Parallel RL stage improves over direct RL baselines by 3.0%. Crucially, NPR delivers up to 4.6× inference acceleration over autoregressive baselines and exhibits genuine, non-simulated parallel reasoning behaviors.

Getting Started

Stage 1: NPR-Zero

How to Install

# Create env for NPR-Zero
cd npr-zero
conda create -n zero python=3.11
conda activate zero
conda install nvidia::cuda-nvcc

# Install dependencies
pip install -e .[sglang]
pip install liger-kernel
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip uninstall pynvml
pip install "latex2sympy2-extended[antlr4_9_3]"

Prepare Datasets and Model

Download the training dataset ORZ from huggingface to experiments/raw_data folder.
python examples/data_preprocess/orz.py
python examples/data_preprocess/aime25.py
Download the model Qwen3-4B-Instruct-2507 from huggingface.

Training Scripts

Modify the RAY_DATA_HOME and MODEL_PATH to yours.

# Run NPR-Zero training
bash experiments/run.sh

Stage 2: NPR-Beta

How to Install

# Create env for NPR-Beta
cd npr-Beta
conda create -n warmup python=3.11 -y
conda activate warmup

# Install dependencies
pip install -r requirements.txt

Prepare Datasets and Model

# Perform rejection sampling
bash scripts/sampling.sh

Key parameters in sampling.sh:

MODEL_PATH: Path to model checkpoint (Stage 1)
OUTPUT_DIR: Output directory for sampled trajectories
--dataset: Dataset name (default: ORZ-MATH-57K)
--instruction: Prompt template file
--max_sample_trial: Max sampling attempts per problem (default: 8)
--temperature: Sampling temperature (default: 1.0)

Training Scripts

# Start warmup training
bash train/sft_math.sh

Key parameters in sft_math.sh:

base_model: Base model to fine-tune (default: Qwen3-4B-Instruct)
train_file_path: Training trajectories path (default: dataset/math/rejection_sampling/train)
lr: Learning rate
epochs: Number of training epochs
Output checkpoints saved to ckpts/NPR-Warmup-4B-Inst-{timestamp}/

Stage 3: NPR

How to Install

# Create env for NPR-RL
cd npr-rl
conda create -n rl python=3.11
conda activate rl
conda install nvidia::cuda-nvcc

# Install dependencies
pip install -e .
pip install liger-kernel
pip uninstall pynvml
pip install "latex2sympy2-extended[antlr4_9_3]"
cd verl/workers/rollout/sglang_rollout/sglang/python
pip install -e .
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install fire

Prepare Datasets and Model

Download the training dataset ORZ from huggingface to experiments/raw_data folder.
python examples/data_preprocess/orz.py
python examples/data_preprocess/aime.py

Training Scripts

Modify the RAY_DATA_HOME and MODEL_PATH to yours.

Note the MODEL_PATH is from Stage 2.

# Run native parallel RL
bash experiments/run.sh

Evaluations

How to Install

# Create env for evaluations
cd evals
conda create -n eval python=3.10
conda activate eval

# Install dependencies
pip install -r requirements.txt

Prepare Datasets and Model

python convert_to_hf.py verl/experiments/ckpts/project_name/exp_name/global_step_x/actor <STAGE_2_MODEL_PATH> <TARGET_HF_MODEL_PATH>
Or download NPR-4B from Huggingface.

Scripts

Modify the <<TARGET_HF_MODEL_PATH>> to yours.

# Start evaluation of AIME25
./scripts/eval.sh \
    --cuda 0,1,2,3,4,5,6,7 \
    --tp_size 2 \
    --dp_size 4 \
    --task "AIME25" \
    --max_eval_samples 30 \
    --eval_batch_size 8 \
    --model_path <TARGET_HF_MODEL_PATH> \
    --prompt_path prompts/npr.txt \
    --engine parallel \
    --num_samples 1 \
    --k 1 \
    --max_new_tokens 40000 \
    --temperature 1.0 \
    --top_p 0.7 \
    --top_k -1 \
    --overwrite \
    --apply_chat

Official Results of NPR

We report Pass@1 accuracy averaged over 8 samples for each problem as below.

Contributing

Join the Discussions: Share your insights, provide feedback, or ask questions.
Report Issues: Submit bugs found or log feature requests for the Native-Parallel-Reasoner project.
Submit Pull Requests: Review open PRs, and submit your own PRs.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your local host.
Clone Locally: Clone the forked repository to your local machine using a git client.
```
git clone https://github.com/bigai-nlco/Native-Parallel-Reasoner.git 
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b your_name/feature-x
CE```
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to local: Push the changes to your forked repository.
```
git push origin feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!

License

Native Parallel Reasoner is protected under the LICENSE. For more details, please refer to the LICENSE file.

Citation

@misc{nativeparallelreasonerreasoning,
      title={Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning}, 
      author={Tong Wu and Yang Liu and Jun Bai and Zixia Jia and Shuyi Zhang and Ziyong Lin and Yanting Wang and Song-Chun Zhu and Zilong Zheng},
      year={2025},
      eprint={2512.07461},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.07461}, 
}

Acknowledgment

This codebase is influenced by these remarkable projects of AI community, including verl, sglang, and Multiverse.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
evals		evals
npr-beta		npr-beta
npr-rl		npr-rl
npr-zero		npr-zero
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_to_hf.py		convert_to_hf.py
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Table of Contents

Overview

Getting Started

Stage 1: NPR-Zero

How to Install

Prepare Datasets and Model

Training Scripts

Stage 2: NPR-Beta

How to Install

Prepare Datasets and Model

Training Scripts

Stage 3: NPR

How to Install

Prepare Datasets and Model

Training Scripts

Evaluations

How to Install

Prepare Datasets and Model

Scripts

Official Results of NPR

Contributing

License

Citation

Acknowledgment

About

Uh oh!

Contributors 4

Uh oh!

Languages

License

bigai-nlco/Native-Parallel-Reasoner

Folders and files

Latest commit

History

Repository files navigation

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Table of Contents

Overview

Getting Started

Stage 1: NPR-Zero

How to Install

Prepare Datasets and Model

Training Scripts

Stage 2: NPR-Beta

How to Install

Prepare Datasets and Model

Training Scripts

Stage 3: NPR

How to Install

Prepare Datasets and Model

Training Scripts

Evaluations

How to Install

Prepare Datasets and Model

Scripts

Official Results of NPR

Contributing

License

Citation

Acknowledgment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages