Table 2 Table of the number of... | Emerald Publishing

Table 2

Table of the number of parameters, training data, cost, and time of several large LMs, where blank cells indicate that the data are not available. The sources are cited if the data are not obtained from the original work.

Model	Year	Number of Parameters	Training data	Training cost	Training time
BERT-Large	2018	340M	3.3B words	$7,000^⁵	64 TPU chips 4 days
XLNet-Lagre	2019	340M	32.9B tokens	$245,000^⁵	512 TPU v3 chips 5.5 days
GPT-2	2019	1.5B	8 million web pages	$12,902-$43,008 [166]	32 TPU v3 chip 168 hours
Megatron-LM	2019	8.3B	174 GB of text data		512 GPUs 2 days per epoch
T5	2019	11B	745GB of text data	Over $1.3 million [160]
Turing-NLG	2020	17B
GPT-3	2020	175B	570GB of text data	Over $4.6 million^⁶	1024 A100 GPUs 34 days [120]
Megatron-Turing NLG	2022	530B	270B tokens		2K A100 GPUs 3 months^⁷

Model	Year	Number of Parameters	Training data	Training cost	Training time
BERT-Large	2018	340M	3.3B words	$7,000⁵	64 TPU chips 4 days
XLNet-Lagre	2019	340M	32.9B tokens	$245,000⁵	512 TPU v3 chips 5.5 days
GPT-2	2019	1.5B	8 million web pages	$12,902-$43,008 [166]	32 TPU v3 chip 168 hours
Megatron-LM	2019	8.3B	174 GB of text data		512 GPUs 2 days per epoch
T5	2019	11B	745GB of text data	Over $1.3 million [160]
Turing-NLG	2020	17B
GPT-3	2020	175B	570GB of text data	Over $4.6 million⁶	1024 A100 GPUs 34 days [120]
Megatron-Turing NLG	2022	530B	270B tokens		2K A100 GPUs 3 months⁷