Table of the number of parameters, training data, cost, and time of several large LMs, where blank cells indicate that the data are not available. The sources are cited if the data are not obtained from the original work.
| Model | Year | Number of Parameters | Training data | Training cost | Training time |
|---|---|---|---|---|---|
| BERT-Large | 2018 | 340M | 3.3B words | $7,0005 | 64 TPU chips 4 days |
| XLNet-Lagre | 2019 | 340M | 32.9B tokens | $245,0005 | 512 TPU v3 chips 5.5 days |
| GPT-2 | 2019 | 1.5B | 8 million web pages | $12,902-$43,008 [166] | 32 TPU v3 chip 168 hours |
| Megatron-LM | 2019 | 8.3B | 174 GB of text data | 512 GPUs 2 days per epoch | |
| T5 | 2019 | 11B | 745GB of text data | Over $1.3 million [160] | |
| Turing-NLG | 2020 | 17B | |||
| GPT-3 | 2020 | 175B | 570GB of text data | Over $4.6 million6 | 1024 A100 GPUs 34 days [120] |
| Megatron-Turing NLG | 2022 | 530B | 270B tokens | 2K A100 GPUs 3 months7 |
| Model | Year | Number of Parameters | Training data | Training cost | Training time |
|---|---|---|---|---|---|
| BERT-Large | 2018 | 340M | 3.3B words | $7,000 | 64 TPU chips 4 days |
| XLNet-Lagre | 2019 | 340M | 32.9B tokens | $245,000 | 512 TPU v3 chips 5.5 days |
| GPT-2 | 2019 | 1.5B | 8 million web pages | $12,902-$43,008 [ | 32 TPU v3 chip 168 hours |
| Megatron-LM | 2019 | 8.3B | 174 GB of text data | 512 GPUs 2 days per epoch | |
| T5 | 2019 | 11B | 745GB of text data | Over $1.3 million [ | |
| Turing-NLG | 2020 | 17B | |||
| GPT-3 | 2020 | 175B | 570GB of text data | Over $4.6 million | 1024 A100 GPUs 34 days [ |
| Megatron-Turing NLG | 2022 | 530B | 270B tokens | 2K A100 GPUs 3 months |