-
I am a Ph.D. student at Gaoling School of AI, Renmin University of China, fortunately advised by Xin Zhao.
-
Ever since 2021, I have been a research student advised by Shaohan Huang and Furu Wei from the GenAI Group of Microsoft Research, with whom I have accomplished many of my representative works.
-
I was previously a research assistant in the CoAI Group, Tsinghua University, fortunately advised by Yuxian Gu and Minlie Huang. I also worked as a research engineer at BIGAI, fortunately collaborating with Xuekai Zhu.
Recent Focus:
My current research emphasizes Reinforcement Learning for LLM Reasoning, especially the Exploration Mechanisms!
Check out our works: Reasoning with Exploration: An Entropy Perspective (AAAI 2026), FlowRL and STILL.
Feel free to reach out if you are interested in collaboration or discussions!
- Email: daixuancheng6@gmail.com
-
Ph.D. Student in Artificial Intelligence, Gaoling School of AI, Renmin University of China (2025 – Present)
- Advisor: Xin Zhao
-
M.S. in Computer Science, School of Computer Science, Beijing University of Posts and Telecommunications (2020 – 2023)
- Advisor: Haifeng Sun
-
B.S. in Communication Engineering, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications (2016 – 2020)
I am dedicated to enhancing Large Language Models (LLMs) across their entire lifecycle, including:
- Reasoning and Reinforcement Learning: Reasoning with Exploration, FlowRL, STILL.
- Pre-training: Instruction Pre-Training, AdaptLLM, VL-Match.
- Domain Adaptation: AdaptLLM, AdaMLLM, SODA.
- Synthetic Data: Instruction Pre-Training, AdaptLLM, ToEdit.
- Retrieval Augmented Generation: UPRISE, MDR.
(Full list on Google Scholar)
-
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Wayne Xin Zhao, Zhenliang Zhang, Furu Wei
(AAAI 2026 — Earliest Research on Exploration of RL in LLM reasoning, Relation between Entropy and Exploration, Proposed Entropy Advantage, Significant Pass@K Gain) pdf -
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin
(arXiv Preprint, 2025 — Exploration of RL in LLM reasoning, 🤗 #1 Paper of the Day, Recipe at VERL) pdf code -
Adapting Large Language Models via Reading Comprehension
Daixuan Cheng, Shaohan Huang, Furu Wei
(ICLR 2024 — Earliest Research on Domain LLMs, 500K+ Downloads on Hugging Face, #1 Trending of ALL Domain LLMs on Huggingface, 🤗 #2 Paper of the Day) pdf code huggingface -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei
(EMNLP 2024 (Main, Long Paper) — LLM pre-training, Recommended by Sebastian Raschka, 200K+ Downloads on Hugging Face, #2 Trending of ALL Huggingface Datasets, 🤗 #2 Paper of the Day) pdf code -
Uprise: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
(EMNLP 2023 (Main, Long Paper) — Early Research on RAG for LLMs, Top ML Papers of the Week (along with GPT-4)) pdf code -
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang
(EMNLP 2025 (Findings, Long Paper) — Earliest Research on Domain MLLMs) pdf code huggingface -
How to Synthesize Text Data without Model Collapse?
Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, Bowen Zhou
(ICML 2025 — Synthetic data for LLMs) pdf code -
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching
Junyu Bi, Daixuan Cheng, Ping Yao, Bochen Pang, Yuefeng Zhan, Chuanguang Yang, Yujing Wang, Hao Sun, Weiwei Deng, Qi Zhang
(ICCV 2023 — Pre-training of Vision-language Models) pdf
- Snapshot-guided domain adaptation for ELECTRA
Daixuan Cheng, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
(EMNLP 2022 (Findings, Short Paper) — Domain Adaptation of LM) pdf
- Outstanding Reviewer of EMNLP (Top 0.5%)
- 1st Place in the PhD Entrance Exam (Preliminary) at the GSAI, Renmin University of China
- National Scholarship for Master Students (Top 1%)
- 1st Prize in the National English Competition (Top 0.5%)
