EconPapers    
Economics at your fingertips  
 

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Honghui Ding, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaichao You, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wen Liu, Wenfeng Liang (), Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang and Zhen Zhang
Additional contact information
Daya Guo: DeepSeek-AI Team
Dejian Yang: DeepSeek-AI Team
Haowei Zhang: DeepSeek-AI Team
Junxiao Song: DeepSeek-AI Team
Peiyi Wang: DeepSeek-AI Team
Qihao Zhu: DeepSeek-AI Team
Runxin Xu: DeepSeek-AI Team
Ruoyu Zhang: DeepSeek-AI Team
Shirong Ma: DeepSeek-AI Team
Xiao Bi: DeepSeek-AI Team
Xiaokang Zhang: DeepSeek-AI Team
Xingkai Yu: DeepSeek-AI Team
Yu Wu: DeepSeek-AI Team
Z. F. Wu: DeepSeek-AI Team
Zhibin Gou: DeepSeek-AI Team
Zhihong Shao: DeepSeek-AI Team
Zhuoshu Li: DeepSeek-AI Team
Ziyi Gao: DeepSeek-AI Team
Aixin Liu: DeepSeek-AI Team
Bing Xue: DeepSeek-AI Team
Bingxuan Wang: DeepSeek-AI Team
Bochao Wu: DeepSeek-AI Team
Bei Feng: DeepSeek-AI Team
Chengda Lu: DeepSeek-AI Team
Chenggang Zhao: DeepSeek-AI Team
Chengqi Deng: DeepSeek-AI Team
Chong Ruan: DeepSeek-AI Team
Damai Dai: DeepSeek-AI Team
Deli Chen: DeepSeek-AI Team
Dongjie Ji: DeepSeek-AI Team
Erhang Li: DeepSeek-AI Team
Fangyun Lin: DeepSeek-AI Team
Fucong Dai: DeepSeek-AI Team
Fuli Luo: DeepSeek-AI Team
Guangbo Hao: DeepSeek-AI Team
Guanting Chen: DeepSeek-AI Team
Guowei Li: DeepSeek-AI Team
H. Zhang: DeepSeek-AI Team
Hanwei Xu: DeepSeek-AI Team
Honghui Ding: DeepSeek-AI Team
Huazuo Gao: DeepSeek-AI Team
Hui Qu: DeepSeek-AI Team
Hui Li: DeepSeek-AI Team
Jianzhong Guo: DeepSeek-AI Team
Jiashi Li: DeepSeek-AI Team
Jingchang Chen: DeepSeek-AI Team
Jingyang Yuan: DeepSeek-AI Team
Jinhao Tu: DeepSeek-AI Team
Junjie Qiu: DeepSeek-AI Team
Junlong Li: DeepSeek-AI Team
J. L. Cai: DeepSeek-AI Team
Jiaqi Ni: DeepSeek-AI Team
Jian Liang: DeepSeek-AI Team
Jin Chen: DeepSeek-AI Team
Kai Dong: DeepSeek-AI Team
Kai Hu: DeepSeek-AI Team
Kaichao You: DeepSeek-AI Team
Kaige Gao: DeepSeek-AI Team
Kang Guan: DeepSeek-AI Team
Kexin Huang: DeepSeek-AI Team
Kuai Yu: DeepSeek-AI Team
Lean Wang: DeepSeek-AI Team
Lecong Zhang: DeepSeek-AI Team
Liang Zhao: DeepSeek-AI Team
Litong Wang: DeepSeek-AI Team
Liyue Zhang: DeepSeek-AI Team
Lei Xu: DeepSeek-AI Team
Leyi Xia: DeepSeek-AI Team
Mingchuan Zhang: DeepSeek-AI Team
Minghua Zhang: DeepSeek-AI Team
Minghui Tang: DeepSeek-AI Team
Mingxu Zhou: DeepSeek-AI Team
Meng Li: DeepSeek-AI Team
Miaojun Wang: DeepSeek-AI Team
Mingming Li: DeepSeek-AI Team
Ning Tian: DeepSeek-AI Team
Panpan Huang: DeepSeek-AI Team
Peng Zhang: DeepSeek-AI Team
Qiancheng Wang: DeepSeek-AI Team
Qinyu Chen: DeepSeek-AI Team
Qiushi Du: DeepSeek-AI Team
Ruiqi Ge: DeepSeek-AI Team
Ruisong Zhang: DeepSeek-AI Team
Ruizhe Pan: DeepSeek-AI Team
Runji Wang: DeepSeek-AI Team
R. J. Chen: DeepSeek-AI Team
R. L. Jin: DeepSeek-AI Team
Ruyi Chen: DeepSeek-AI Team
Shanghao Lu: DeepSeek-AI Team
Shangyan Zhou: DeepSeek-AI Team
Shanhuang Chen: DeepSeek-AI Team
Shengfeng Ye: DeepSeek-AI Team
Shiyu Wang: DeepSeek-AI Team
Shuiping Yu: DeepSeek-AI Team
Shunfeng Zhou: DeepSeek-AI Team
Shuting Pan: DeepSeek-AI Team
S. S. Li: DeepSeek-AI Team
Shuang Zhou: DeepSeek-AI Team
Shaoqing Wu: DeepSeek-AI Team
Tao Yun: DeepSeek-AI Team
Tian Pei: DeepSeek-AI Team
Tianyu Sun: DeepSeek-AI Team
T. Wang: DeepSeek-AI Team
Wangding Zeng: DeepSeek-AI Team
Wen Liu: DeepSeek-AI Team
Wenfeng Liang: DeepSeek-AI Team
Wenjun Gao: DeepSeek-AI Team
Wenqin Yu: DeepSeek-AI Team
Wentao Zhang: DeepSeek-AI Team
W. L. Xiao: DeepSeek-AI Team
Wei An: DeepSeek-AI Team
Xiaodong Liu: DeepSeek-AI Team
Xiaohan Wang: DeepSeek-AI Team
Xiaokang Chen: DeepSeek-AI Team
Xiaotao Nie: DeepSeek-AI Team
Xin Cheng: DeepSeek-AI Team
Xin Liu: DeepSeek-AI Team
Xin Xie: DeepSeek-AI Team
Xingchao Liu: DeepSeek-AI Team
Xinyu Yang: DeepSeek-AI Team
Xinyuan Li: DeepSeek-AI Team
Xuecheng Su: DeepSeek-AI Team
Xuheng Lin: DeepSeek-AI Team
X. Q. Li: DeepSeek-AI Team
Xiangyue Jin: DeepSeek-AI Team
Xiaojin Shen: DeepSeek-AI Team
Xiaosha Chen: DeepSeek-AI Team
Xiaowen Sun: DeepSeek-AI Team
Xiaoxiang Wang: DeepSeek-AI Team
Xinnan Song: DeepSeek-AI Team
Xinyi Zhou: DeepSeek-AI Team
Xianzu Wang: DeepSeek-AI Team
Xinxia Shan: DeepSeek-AI Team
Y. K. Li: DeepSeek-AI Team
Y. Q. Wang: DeepSeek-AI Team
Y. X. Wei: DeepSeek-AI Team
Yang Zhang: DeepSeek-AI Team
Yanhong Xu: DeepSeek-AI Team
Yao Li: DeepSeek-AI Team
Yao Zhao: DeepSeek-AI Team
Yaofeng Sun: DeepSeek-AI Team
Yaohui Wang: DeepSeek-AI Team
Yi Yu: DeepSeek-AI Team
Yichao Zhang: DeepSeek-AI Team
Yifan Shi: DeepSeek-AI Team
Yiliang Xiong: DeepSeek-AI Team
Ying He: DeepSeek-AI Team
Yishi Piao: DeepSeek-AI Team
Yisong Wang: DeepSeek-AI Team
Yixuan Tan: DeepSeek-AI Team
Yiyang Ma: DeepSeek-AI Team
Yiyuan Liu: DeepSeek-AI Team
Yongqiang Guo: DeepSeek-AI Team
Yuan Ou: DeepSeek-AI Team
Yuduan Wang: DeepSeek-AI Team
Yue Gong: DeepSeek-AI Team
Yuheng Zou: DeepSeek-AI Team
Yujia He: DeepSeek-AI Team
Yunfan Xiong: DeepSeek-AI Team
Yuxiang Luo: DeepSeek-AI Team
Yuxiang You: DeepSeek-AI Team
Yuxuan Liu: DeepSeek-AI Team
Yuyang Zhou: DeepSeek-AI Team
Y. X. Zhu: DeepSeek-AI Team
Yanping Huang: DeepSeek-AI Team
Yaohui Li: DeepSeek-AI Team
Yi Zheng: DeepSeek-AI Team
Yuchen Zhu: DeepSeek-AI Team
Yunxian Ma: DeepSeek-AI Team
Ying Tang: DeepSeek-AI Team
Yukun Zha: DeepSeek-AI Team
Yuting Yan: DeepSeek-AI Team
Z. Z. Ren: DeepSeek-AI Team
Zehui Ren: DeepSeek-AI Team
Zhangli Sha: DeepSeek-AI Team
Zhe Fu: DeepSeek-AI Team
Zhean Xu: DeepSeek-AI Team
Zhenda Xie: DeepSeek-AI Team
Zhengyan Zhang: DeepSeek-AI Team
Zhewen Hao: DeepSeek-AI Team
Zhicheng Ma: DeepSeek-AI Team
Zhigang Yan: DeepSeek-AI Team
Zhiyu Wu: DeepSeek-AI Team
Zihui Gu: DeepSeek-AI Team
Zijia Zhu: DeepSeek-AI Team
Zijun Liu: DeepSeek-AI Team
Zilin Li: DeepSeek-AI Team
Ziwei Xie: DeepSeek-AI Team
Ziyang Song: DeepSeek-AI Team
Zizheng Pan: DeepSeek-AI Team
Zhen Huang: DeepSeek-AI Team
Zhipeng Xu: DeepSeek-AI Team
Zhongyu Zhang: DeepSeek-AI Team
Zhen Zhang: DeepSeek-AI Team

Nature, 2025, vol. 645, issue 8081, 633-638

Abstract: Abstract General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41586-025-09422-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:645:y:2025:i:8081:d:10.1038_s41586-025-09422-z

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-025-09422-z

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-09-19
Handle: RePEc:nat:nature:v:645:y:2025:i:8081:d:10.1038_s41586-025-09422-z