Table 2

Table of the number of parameters, training data, cost, and time of several large LMs, where blank cells indicate that the data are not available. The sources are cited if the data are not obtained from the original work.

ModelYearNumber of ParametersTraining dataTraining costTraining time
BERT-Large2018340M3.3B words$7,000564 TPU chips 4 days
XLNet-Lagre2019340M32.9B tokens$245,0005512 TPU v3 chips 5.5 days
GPT-220191.5B8 million web pages$12,902-$43,008 [166]32 TPU v3 chip 168 hours
Megatron-LM20198.3B174 GB of text data 512 GPUs 2 days per epoch
T5201911B745GB of text dataOver $1.3 million [160] 
Turing-NLG202017B   
GPT-32020175B570GB of text dataOver $4.6 million61024 A100 GPUs 34 days [120]
Megatron-Turing NLG2022530B270B tokens 2K A100 GPUs 3 months7

or Create an Account

Close Modal
Close Modal