Scaling Laws for Neural Language Models

dc.contributor.authorKaplan, Jared
dc.contributor.authorMcCandlish, Sam
dc.contributor.authorHenighan, Tom
dc.contributor.authorBrown, Tom B
dc.contributor.authorChess, Benjamin
dc.contributor.authorChild, Rewon
dc.contributor.authorGray, Scott
dc.contributor.authorRadford, Alec
dc.contributor.authorWu, Jeffrey
dc.contributor.authorAmodei, Dario
dc.date.accessioned2025-06-02T13:13:50Z
dc.date.available2025-06-02T13:13:50Z
dc.date.issued2020-01-23
dc.description이 논문은 언어모델 성능이 파라미터 수, 데이터 크기, 연산량에 대해 power-law 패턴으로 증가함을 수치적으로 분석합니다. ©2020 OpenAI
dc.description.abstractWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
dc.description.sponsorshipOpenAI
dc.identifier.urihttps://arxiv.org/abs/2001.08361
dc.identifier.urihttp://data.inu.ac.kr/handle/123456789/1956
dc.language.isoen_US
dc.publisherarXiv
dc.subjectScaling Laws
dc.subjectLanguage Models
dc.subjectParameter Scaling
dc.subjectDeep Learning
dc.titleScaling Laws for Neural Language Models
dc.typeArticle

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2001.08361v1.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
97 B
Format:
Item-specific license agreed to upon submission
Description: