- 2025.05: We have updated the related work discussed in the special issue to this repo. These updates will be formally integrated into the next version of paper. We warmly welcome any discussions or feedback.
- 2025.03: We released a github repo to record papers related with reasoning economy. Feel free to cite or open pull requests.
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
- From System 1 to System 2: A Survey of Reasoning Large Language Models
- A Survey on Post-training of Large Language Models
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Supervised Fine-tuning
- GPT-4 Technical Report
- Stanford Alpaca: An Instruction-following LLaMA Model
- STaR: Bootstrapping Reasoning With Reasoning
- LIMA: Less Is More for Alignment
- LIMR: Less is More for RL Scaling
- s1: Simple test-time scaling
- Large Language Models Can Self-Improve
- Improving Language Model Reasoning with Self-motivated Learning
- Finetuned Language Models Are Zero-Shot Learners
- Reinforced Self-Training (ReST) for Language Modeling
- Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Reinforcement Learning
- Proximal Policy Optimization Algorithms
- Group Robust Preference Optimization in Reward-free RLHF
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
- The Lessons of Developing Process Reward Models in Mathematical Reasoning
- QwQ: Reflect Deeply on the Boundaries of the Unknown
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- Kimi k1.5: Scaling Reinforcement Learning with LLMs
- ReFT: Reasoning with Reinforced Fine-Tuning
- Let's Verify Step by Step
- Training Verifiers to Solve Math Word Problems
- Critique-out-Loud Reward Models
- DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
- Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Parallel Methods
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Sequential Methods
- Chain-of-thought Prompting Elicits Reasoning in Large Language Models
- Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Self-Evaluation Guided Beam Search for Reasoning
- Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
- A Long Way to Go: Investigating Length Correlations in RLHF
- Fine-grained human feedback gives better rewards for language model training
- Learning to summarize from human feedback
- Scaling Laws for Reward Model Overoptimization
- Defining and Characterizing Reward Hacking
❗️ Findings of Overly Cautious reasoning LLMs
- When More is Less: Understanding Chain-of-Thought Length in LLMs
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- The Impact of Reasoning Step Length on Large Language Models
- Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
- Over-Reasoning and Redundant Calculation of Large Language Models
- Kimi k1.5: Scaling Reinforcement Learning with LLMs
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
- Fake Alignment: Are LLMs Really Aligned Well?
- Alignment faking in large language models
- Large Language Models Often Say One Thing and Do Another
❗️ Findings of Fake Thinking reasoning LLMs
- When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Large Language Models Cannot Self-Correct Reasoning Yet
- PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
- Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
- Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
- Scaling Test-Time Compute Without Verification or RL is Suboptimal
- Adaptive Decoding via Latent Preference Optimization
- Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
- Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement
- When More is Less: Understanding Chain-of-Thought Length in LLMs
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
- s1: Simple test-time scaling
- O1 Replication Journey: A Strategic Progress Report -- Part 1
- LIMA: Less Is More for Alignment
- LIMR: Less is More for RL Scaling
- THINKPRUNE: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
- SimPO: Simple Preference Optimization with a Reference-Free Reward
- A Long Way to Go: Investigating Length Correlations in RLHF
- Disentangling Length from Quality in Direct Preference Optimization
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale
- Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- Self-Training Elicits Concise Reasoning in Large Language Models
- O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
- ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
- Training Language Models to Reason Efficiently
- Token-Budget-Aware LLM Reasoning
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
- Enhancing LLM Reasoning with Reward-guided Tree Search
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Explicit Compression
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
- Can Language Models Learn to Skip Steps?
- InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Implicit Compression
- Anchor-based Large Language Models
- Think before you speak: Training Language Models With Pause Tokens
- Training Large Language Models to Reason in a Continuous Latent Space
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
- Implicit Chain of Thought Reasoning via Knowledge Distillation
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
- LightThinker: Thinking Step-by-Step Compression
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
- Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Single Model Routing
Multi-model Cooperation
- JUDGE DECODING: FASTER SPECULATIVE SAMPLING REQUIRES GOING BEYOND MODEL ALIGNMENT
- Speculative Decoding with Big Little Decoder
- MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning
- Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
- Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models
- DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models
- CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
- Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
- MasRouter: Learning to Route LLMs for Multi-Agent Systems
- Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Knowledge Distillation
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- Distilling System 2 into System 1
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
- Physics of Language Models: Part 2.1
- What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
- Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
- Not All Layers of LLMs Are Necessary During Inference
- LaCo: Large Language Model Pruning via Layer Collapse
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
- Fast yet Safe: Early-Exiting with Risk Control
- Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding
- Following Length Constraints in Instructions
- Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
- Token-Budget-Aware LLM Reasoning
- Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
- INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
- Adaptive Decoding via Latent Preference Optimization
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Early Stopping
- Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
- Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
- Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
- Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
- Efficient Test-Time Scaling via Self-Calibration
- Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing
Pruning While Searching
- Enhancing LLM Reasoning with Reward-guided Tree Search
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
- Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
- Self-Evaluation Guided Beam Search for Reasoning
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
- Fast Best-of-N Decoding via Speculative Rejection
- SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
- START: Self-taught Reasoner with Tools
Constrained Decoding
- s1: Simple test-time scaling
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Large Language Models Cannot Self-Correct Reasoning Yet
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Efficient Multi-modal Reasoning
- MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
- Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
- Video-R1: Reinforcing Video Reasoning in MLLMs
- R1-vl: Learning to reason with multimodal large language models via step-wise group relative policy optimization
- LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
- MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
- Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
- Efficient Multimodal Large Language Models: A Survey
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Efficient Agentic Reasoning
- ATLAS: Agent Tuning via Learning Critical Steps
- Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities
- ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks
- AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration
- Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Evaluation Metrics and Benchmarks
- Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
- PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
Explaniabilty of Reasoning LLMs
- On the Biology of a Large Language Model
- Circuit Tracing: Revealing Computational Graphs in Language Models
If you find this work useful, welcome to cite us.
@misc{wang2025harnessingreasoningeconomysurvey,
title={Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models},
author={Rui Wang and Hongru Wang and Boyang Xue and Jianhui Pang and Shudong Liu and Yi Chen and Jiahao Qiu and Derek Fai Wong and Heng Ji and Kam-Fai Wong},
year={2025},
eprint={2503.24377},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.24377},
}