Scaling Laws for Neural Language Models

Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario

Scaling Laws for Neural Language Models

dc.contributor.author	Kaplan, Jared
dc.contributor.author	McCandlish, Sam
dc.contributor.author	Henighan, Tom
dc.contributor.author	Brown, Tom B
dc.contributor.author	Chess, Benjamin
dc.contributor.author	Child, Rewon
dc.contributor.author	Gray, Scott
dc.contributor.author	Radford, Alec
dc.contributor.author	Wu, Jeffrey
dc.contributor.author	Amodei, Dario
dc.date.accessioned	2025-06-02T13:13:50Z
dc.date.available	2025-06-02T13:13:50Z
dc.date.issued	2020-01-23
dc.description	이 논문은 언어모델 성능이 파라미터 수, 데이터 크기, 연산량에 대해 power-law 패턴으로 증가함을 수치적으로 분석합니다. ©2020 OpenAI
dc.description.abstract	We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
dc.description.sponsorship	OpenAI
dc.identifier.uri	https://arxiv.org/abs/2001.08361
dc.identifier.uri	http://data.inu.ac.kr/handle/123456789/1956
dc.language.iso	en_US
dc.publisher	arXiv
dc.subject	Scaling Laws
dc.subject	Language Models
dc.subject	Parameter Scaling
dc.subject	Deep Learning
dc.title	Scaling Laws for Neural Language Models
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2001.08361v1.pdf
Size:: 2.38 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 97 B
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Natural Language Processing