Base LLMs adopted in this study
| Qwen2.5–72B | Qwen2.5-MAX | Qwen3-235B-A22 B | DeepSeek-V3 | DeepSeek-R1 | GPT-4o | |
|---|---|---|---|---|---|---|
| # Architecture | Dense | MoE | MoE | MoE | Dense | Dense, Multimodal |
| # Total Params | 72B | 325 B | 235B | 671 B | 260B | ∼1.8 T (estimated) |
| # Activated Params | 72B | 22 B | 22B | 37 B | 260B | ∼1.8 T (estimated) |
| # Availability | Open Source | Open Source | Open Source | Open Source | Open Source | Closed Source |
| # Model Category | General | General | Reasoning | General | Reasoning | General |
| Qwen2.5–72B | Qwen2.5-MAX | Qwen3-235B-A22 B | DeepSeek-V3 | DeepSeek-R1 | GPT-4o | |
|---|---|---|---|---|---|---|
| # Architecture | Dense | MoE | MoE | MoE | Dense | Dense, Multimodal |
| # Total | 72B | 325 B | 235B | 671 B | 260B | ∼1.8 T (estimated) |
| # Activated | 72B | 22 B | 22B | 37 B | 260B | ∼1.8 T (estimated) |
| # Availability | Open Source | Open Source | Open Source | Open Source | Open Source | Closed Source |
| # Model | General | General | Reasoning | General | Reasoning | General |
Note(s): MoE is short for mixture-of-expert