A curated list of papers about making LLM reasoning more efficient (shorter/faster/better).
π Latest Update: Our paper collection is continuously updated. Feel free to contribute!
If you find this resource helpful, please cite the relevant papers:
@misc{awesome-long2short-papers,
title={Awesome Long2Short Papers},
author={Community Contributors},
year={2024},
publisher={GitHub},
howpublished={\url{https://github.com/yzhangchuck/awesome-llm-reasoning-long2short-papers}}
}
- π Analysis and Understanding
- π€ Reasoning Scaling
- β‘ Inference Intervention
- π§ Latent Reasoning
- π Supervised Fine-tuning
- ποΈ Steering Vector
- π― Reinforcement Learning
- π General Papers
Deep dives into LLM reasoning
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach | 2025 | arXiv preprint | [Paper] | [Code] |
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | 2025 | arXiv preprint | [Paper] | - |
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs | 2024 | arXiv preprint | [Paper] | - |
Scaling up reasoning
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
Test Time Preference Optimization On the Fly Alignment via Iterative Textual Feedback | 2025 | arXiv preprint | [Paper] | [Code] |
Beyond Human Data Scaling Self Training for Problem Solving with Language Models | 2024 | TMLR | [Paper] | - |
REFT Reasoning with REinforced Fine Tuning | 2024 | ACL | [Paper] | [Code] |
ReST MCTSβ LLM Self Training via Process Reward Guided Tree Search | 2024 | NeurIPS | [Paper] | [Code] |
Recursive Introspection Teaching Language Model Agents How to Self Improve | 2024 | NeurIPS | [Paper] | [Code] |
Scaling LLM Test Time Compute Optimally can be More Effective than Scaling Model Parameters | 2024 | arXiv preprint | [Paper] | - |
Letβs Verify Step by Step | 2023 | ICLR | [Paper] | - |
Interventions during model inference process
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
When More is Less- Understanding Chain-of-Thought Length in LLMs | 2025 | arXiv preprint | [Paper] | - |
Reasoning in latent spaces
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachersβ Guidance | 2025 | arXiv preprint | [Paper] | - |
LightThinker- Thinking Step-by-Step Compression | 2025 | arXiv preprint | [Paper] | - |
Scaling up Test-Time Compute with Latent Reasoning- A Recurrent Depth Approach | 2025 | arXiv preprint | [Paper] | [Code] |
Token Assorted- Mixing Latent and Text Tokens for Improved Language Model Reasoning | 2025 | arXiv preprint | [Paper] | - |
Compressed Chain of Thought- Efficient Reasoning through Dense Representations | 2024 | arXiv preprint | [Paper] | - |
Quiet-STaR- Language Models Can Teach Themselves to Think Before Speaking | 2024 | arXiv preprint | [Paper] | - |
Training Large Language Models to Reason in a Continuous Latent Space | 2024 | arXiv preprint | [Paper] | - |
Direct optimization for efficiency
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
Self-Training Elicits Concise Reasoning in Large Language Models | 2025 | arXiv preprint | [Paper] | [Code] |
TokenSkip- Controllable Chain-of-Thought Compression in LLMs | 2025 | arXiv preprint | [Paper] | [Code] |
s1- Simple test-time scaling | 2025 | arXiv preprint | [Paper] | [Code] |
Steering model behavior through vector manipulation
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
CoT-Valve- Length-Compressible Chain-of-Thought Tuning | 2025 | arXiv preprint | [Paper] | - |
Training models to reason efficiently
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
Kimi k1.5: Scaling Reinforcement Learning with LLMs | 2025 | arXiv preprint | [Paper] | - |
L1- Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | 2025 | arXiv preprint | [Paper] | [Code] |
O1-Pruner- Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | 2024 | arXiv preprint | [Paper] | [Code] |
Training Language Models to Reason Efficiently | 2024 | arXiv preprint | [Paper] | [Code] |
Latest advances in efficient reasoning
Title | Year | Venue | Paper | Code |
---|---|---|---|---|
Inner Thinking Transformer- Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking | 2025 | arXiv preprint | [Paper] | - |
Learning to Reason from Feedback at Test-Time | 2025 | arXiv preprint | [Paper] | [Code] |
S*: Test Time Scaling for Code Generation | 2025 | arXiv preprint | [Paper] | [Code] |
Efficiently Serving LLM Reasoning Programs with Certaindex | 2024 | arXiv preprint | [Paper] | - |
We welcome contributions! Please feel free to submit a pull request to add more papers or improve the existing content.
- Please ensure the paper is related to efficient LLM reasoning
- Follow the existing format
- Add a brief description if possible
- Include links to paper and code (if available)
This project is licensed under the MIT License - see the LICENSE file for details.