Skip to content

This collection aims to present the ‘cherry on the cake’ of recent AI advancements in the realm of LLMs and RL.

License

Notifications You must be signed in to change notification settings

rxlqn/Awesome-LLM-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Awesome-LLM-RL Awesome

Inspired by the awesome-embodied-vision

Papers

  • Ahmadian, A., Cremer, C., Gallé, M., Fadaee, M., Kreutzer, J., Pietquin, O., Üstün, A., & Hooker, S. (2024). Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs (No. arXiv:2402.14740). arXiv. https://doi.org/10.48550/arXiv.2402.14740

  • Guan, X., Zhang, L. L., Liu, Y., Shang, N., Sun, Y., Zhu, Y., Yang, F., & Yang, M. (2025). rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking (No. arXiv:2501.04519). arXiv. https://doi.org/10.48550/arXiv.2501.04519

  • Havrilla, A., Du, Y., Raparthy, S. C., Nalmpantis, C., Dwivedi-Yu, J., Hambro, E., Sukhbaatar, S., & Raileanu, R. (2024, June 13). Teaching Large Language Models to Reason with Reinforcement Learning. AI for Math Workshop @ ICML 2024. https://openreview.net/forum?id=mjqoceuMnI

  • Hu, J., Wu, X., Zhu, Z., Xianyu, Wang, W., Zhang, D., & Cao, Y. (2024). OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (No. arXiv:2405.11143). arXiv. https://doi.org/10.48550/arXiv.2405.11143

  • Kumar, A., Zhuang, V., Agarwal, R., Su, Y., Co-Reyes, J. D., Singh, A., Baumli, K., Iqbal, S., Bishop, C., Roelofs, R., Zhang, L. M., McKinney, K., Shrivastava, D., Paduraru, C., Tucker, G., Precup, D., Behbahani, F., & Faust, A. (2024). Training Language Models to Self-Correct via Reinforcement Learning (No. arXiv:2409.12917). arXiv. https://doi.org/10.48550/arXiv.2409.12917

  • Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023, October 13). Let’s Verify Step by Step. The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=v8L0pN6EOi

  • Qu, Y., Zhang, T., Garg, N., & Kumar, A. (n.d.). RECURSIVE INTROSPECTION: Teaching Language Model Agents How to Self-Improve.

  • Setlur, A., Garg, S., Geng, X., Garg, N., Smith, V., & Kumar, A. (2024). RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold (No. arXiv:2406.14532). arXiv. https://doi.org/10.48550/arXiv.2406.14532

  • Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y. K., Wu, Y., & Guo, D. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (No. arXiv:2402.03300). arXiv. https://doi.org/10.48550/arXiv.2402.03300

  • Snell, C., Lee, J., Xu, K., & Kumar, A. (2024). Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (No. arXiv:2408.03314). arXiv. http://arxiv.org/abs/2408.03314

  • Wang, P., Li, L., Shao, Z., Xu, R. X., Dai, D., Li, Y., Chen, D., Wu, Y., & Sui, Z. (2024). Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations (No. arXiv:2312.08935). arXiv. https://doi.org/10.48550/arXiv.2312.08935

  • Xi, Z., Yang, D., Huang, J., Tang, J., Li, G., Ding, Y., He, W., Hong, B., Do, S., Zhan, W., Wang, X., Zheng, R., Ji, T., Shi, X., Zhai, Y., Weng, R., Wang, J., Cai, X., Gui, T., … Jiang, Y.-G. (2024). Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision (No. arXiv:2411.16579). arXiv. https://doi.org/10.48550/arXiv.2411.16579

  • Xiang, V., Snell, C., Gandhi, K., Albalak, A., Singh, A., Blagden, C., Phung, D., Rafailov, R., Lile, N., Mahan, D., Castricato, L., Franken, J.-P., Haber, N., & Finn, C. (2025). Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought (No. arXiv:2501.04682). arXiv. https://doi.org/10.48550/arXiv.2501.04682

  • Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models (No. arXiv:2305.10601). arXiv. https://doi.org/10.48550/arXiv.2305.10601

  • Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning With Reasoning (No. arXiv:2203.14465). arXiv. https://doi.org/10.48550/arXiv.2203.14465

  • Zeng, Z., Cheng, Q., Yin, Z., Wang, B., Li, S., Zhou, Y., Guo, Q., Huang, X., & Qiu, X. (2024). Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective (No. arXiv:2412.14135). arXiv. https://doi.org/10.48550/arXiv.2412.14135

  • Zhang, Z., Zheng, C., Wu, Y., Zhang, B., Lin, R., Yu, B., Liu, D., Zhou, J., & Lin, J. (2025). The Lessons of Developing Process Reward Models in Mathematical Reasoning (No. arXiv:2501.07301). arXiv. https://doi.org/10.48550/arXiv.2501.07301

About

This collection aims to present the ‘cherry on the cake’ of recent AI advancements in the realm of LLMs and RL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published