Table 7 Performance results of six base...

Table 7

Performance results of six base models with different prompt strategies

Performance	Models
	GPT-4o Base		Qwen2.5–72B base		Qwen2.5-max base		Qwen3-235B-A22 B base		DeepSeek-V3 base		DeepSeek-R1 base
	SP	JP	SP	JP	SP	JP	SP	JP	SP	JP	SP	JP
L	11.78	6.60↓	42.56	29.98↓	59.58	36.53↓	77.21	45.87↓	36.20	22.91↓	186.43	122.84↓
JEM	0.220	0.355↑	0.184	0.291↑	0.206	0.340↑	0.241	0.397↑	0.220	0.362↑	0.262	0.433↑
F1(MNR)	0.705	0.709↑	0.645	0.686↑	0.651	0.682↑	0.635	0.651↑	0.774	0.790↑	0.753	0.762↑
F1(ENR)	0.300	0.412↑	0.304	0.418↑	0.356	0.445↑	0.410	0.507↑	0.337	0.385↑	0.386	0.462↑
F1(MNCX)	0.582	0.654↑	0.598	0.646↑	0.643	0.710↑	0.676	0.690↑	0.622	0.656↑	0.701	0.737↑
F1(ENCX)	0.476	0.712↑	0.515	0.718↑	0.560	0.771↑	0.640	0.802↑	0.591	0.671↑	0.720	0.790↑

Performance	Models
	GPT-4o Base		Qwen2.5–72B base		Qwen2.5-max base		Qwen3-235B-A22 B base		DeepSeek-V3 base		DeepSeek-R1 base
	SP	JP	SP	JP	SP	JP	SP	JP	SP	JP	SP	JP
L	11.78	6.60↓	42.56	29.98↓	59.58	36.53↓	77.21	45.87↓	36.20	22.91↓	186.43	122.84↓
JEM	0.220	0.355↑	0.184	0.291↑	0.206	0.340↑	0.241	0.397↑	0.220	0.362↑	0.262	0.433↑
F1(MNR)	0.705	0.709↑	0.645	0.686↑	0.651	0.682↑	0.635	0.651↑	0.774	0.790↑	0.753	0.762↑
F1(ENR)	0.300	0.412↑	0.304	0.418↑	0.356	0.445↑	0.410	0.507↑	0.337	0.385↑	0.386	0.462↑
F1(MNCX)	0.582	0.654↑	0.598	0.646↑	0.643	0.710↑	0.676	0.690↑	0.622	0.656↑	0.701	0.737↑
F1(ENCX)	0.476	0.712↑	0.515	0.718↑	0.560	0.771↑	0.640	0.802↑	0.591	0.671↑	0.720	0.790↑

Note(s): L stands for latency (time per query, s). JEM stands for Joint Exact-Match. JP stands for Joint Prompt strategy. SP stands for Separate Prompt strategy

Source(s): Authors’ own work

[ViewLarge]