DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

dc.contributor.author	DeepSeek-AI
dc.date.accessioned	2025-06-02T13:25:53Z
dc.date.available	2025-06-02T13:25:53Z
dc.date.issued	2025-01-22
dc.description	RL 기반 학습을 통해 LLM의 추론 능력을 강화하는 DeepSeek-R1 모델을 제안하며 다양한 벤치마크에서 실험적 우수성을 입증합니다. ©2025 DeepSeek-AI This work was authored by the DeepSeek-AI research team with contributions from 41 individuals. Due to the extensive author list, only the organizational author is listed in the metadata. The full list of authors is provided below for reference: https://arxiv.org/abs/2501.12948 :
dc.description.abstract	We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
dc.description.sponsorship	DeepSeek-AI
dc.identifier.uri	https://arxiv.org/abs/2501.12948
dc.identifier.uri	http://data.inu.ac.kr/handle/123456789/1958
dc.language.iso	en_US
dc.subject	DeepSeek-R1
dc.subject	RLHF
dc.subject	Reasoning
dc.subject	LLM
dc.subject	Reinforcement Learning
dc.title	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2501.12948v1.pdf
Size:: 1.25 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 97 B
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Natural Language Processing