We introduce the Native Parallel Reasoner (NPR), a scalable framework for constructing models that intrinsically reason in parallelism. NPR learns adaptive decomposition and aggregation policies through a teacher-free pipeline combining self-distilled parallel Supervised Fine-Tuning (SFT) with Native Parallel Reinforcement Learning (RL). This approach allows the model to optimize its own branching strategies directly from experience within a shared computation graph, preserving its native reasoning style while maximizing exploration efficiency. Across eight diverse reasoning benchmarks, NPR achieves decisive gains: self-distilled data outperform prior teacher-generated corpora by 10.1%, and our Parallel RL stage improves over direct RL baselines by 3.0%. Crucially, NPR delivers up to 4.6× inference acceleration over autoregressive baselines and exhibits genuine, non-simulated parallel reasoning behaviors.
# Create env for NPR-Zero
cd npr-zero
conda create -n zero python=3.11
conda activate zero
conda install nvidia::cuda-nvcc
# Install dependencies
pip install -e .[sglang]
pip install liger-kernel
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip uninstall pynvml
pip install "latex2sympy2-extended[antlr4_9_3]"
- Download the training dataset ORZ from huggingface to
experiments/raw_datafolder. python examples/data_preprocess/orz.pypython examples/data_preprocess/aime25.py- Download the model Qwen3-4B-Instruct-2507 from huggingface.
Modify the RAY_DATA_HOME and MODEL_PATH to yours.
# Run NPR-Zero training
bash experiments/run.sh
# Create env for NPR-Beta
cd npr-Beta
conda create -n warmup python=3.11 -y
conda activate warmup
# Install dependencies
pip install -r requirements.txt
# Perform rejection sampling
bash scripts/sampling.sh
Key parameters in sampling.sh:
MODEL_PATH: Path to model checkpoint (Stage 1)OUTPUT_DIR: Output directory for sampled trajectories--dataset: Dataset name (default: ORZ-MATH-57K)--instruction: Prompt template file--max_sample_trial: Max sampling attempts per problem (default: 8)--temperature: Sampling temperature (default: 1.0)
# Start warmup training
bash train/sft_math.sh
Key parameters in sft_math.sh:
base_model: Base model to fine-tune (default: Qwen3-4B-Instruct)train_file_path: Training trajectories path (default:dataset/math/rejection_sampling/train)lr: Learning rateepochs: Number of training epochs- Output checkpoints saved to
ckpts/NPR-Warmup-4B-Inst-{timestamp}/
# Create env for NPR-RL
cd npr-rl
conda create -n rl python=3.11
conda activate rl
conda install nvidia::cuda-nvcc
# Install dependencies
pip install -e .
pip install liger-kernel
pip uninstall pynvml
pip install "latex2sympy2-extended[antlr4_9_3]"
cd verl/workers/rollout/sglang_rollout/sglang/python
pip install -e .
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install fire
- Download the training dataset ORZ from huggingface to
experiments/raw_datafolder. python examples/data_preprocess/orz.pypython examples/data_preprocess/aime.py
Modify the RAY_DATA_HOME and MODEL_PATH to yours.
Note the MODEL_PATH is from Stage 2.
# Run native parallel RL
bash experiments/run.sh
# Create env for evaluations
cd evals
conda create -n eval python=3.10
conda activate eval
# Install dependencies
pip install -r requirements.txt
python convert_to_hf.py verl/experiments/ckpts/project_name/exp_name/global_step_x/actor <STAGE_2_MODEL_PATH> <TARGET_HF_MODEL_PATH>- Or download NPR-4B from Huggingface.
Modify the <<TARGET_HF_MODEL_PATH>> to yours.
# Start evaluation of AIME25
./scripts/eval.sh \
--cuda 0,1,2,3,4,5,6,7 \
--tp_size 2 \
--dp_size 4 \
--task "AIME25" \
--max_eval_samples 30 \
--eval_batch_size 8 \
--model_path <TARGET_HF_MODEL_PATH> \
--prompt_path prompts/npr.txt \
--engine parallel \
--num_samples 1 \
--k 1 \
--max_new_tokens 40000 \
--temperature 1.0 \
--top_p 0.7 \
--top_k -1 \
--overwrite \
--apply_chat
We report Pass@1 accuracy averaged over 8 samples for each problem as below.
- Join the Discussions: Share your insights, provide feedback, or ask questions.
- Report Issues: Submit bugs found or log feature requests for the
Native-Parallel-Reasonerproject. - Submit Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your local host.
- Clone Locally: Clone the forked repository to your local machine using a git client.
git clone https://github.com/bigai-nlco/Native-Parallel-Reasoner.git
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b your_name/feature-x CE``` - Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.' - Push to local: Push the changes to your forked repository.
git push origin feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
- Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
Native Parallel Reasoner is protected under the LICENSE. For more details, please refer to the LICENSE file.
@misc{nativeparallelreasonerreasoning,
title={Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning},
author={Tong Wu and Yang Liu and Jun Bai and Zixia Jia and Shuyi Zhang and Ziyong Lin and Yanting Wang and Song-Chun Zhu and Zilong Zheng},
year={2025},
eprint={2512.07461},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.07461},
}This codebase is influenced by these remarkable projects of AI community, including verl, sglang, and Multiverse.

