DeepSeek-R1 Training Method Published in Nature

AsianFin -- DeepSeek-AI team, led by Liang Wenfeng, published in Nature the large-scale reasoning model training method used for the open-source AI model DeepSeek-R1.

The study demonstrates that the reasoning ability of large language models (LLMs) can be enhanced through pure reinforcement learning, reducing the amount of human input required for performance improvement. The trained model outperforms traditionally trained LLMs on tasks including mathematics, programming competitions, and graduate-level STEM questions.

DeepSeek-R1 incorporates a deep training phase under human supervision to optimize reasoning processes. Liang Wenfeng’s team reported that the model develops reasoning steps through reinforcement learning rather than human examples, lowering training costs and complexity.

After being shown high-quality problem-solving examples, DeepSeek-R1 receives a template to generate reasoning processes, earning rewards by solving problems and thereby reinforcing learning. The team suggested that future research could focus on optimizing the reward process to ensure more reliable reasoning and task outcomes.

In benchmark evaluations, DeepSeek-R1-Zero and DeepSeek-R1 scored 77.9% and 79.8%, respectively, on mathematics tests and also performed strongly on programming competitions and graduate-level biology, physics, and chemistry problems.

NEWS / Brief News

DeepSeek-R1 Training Method Published in Nature

AsianFin Newsletters