Table 4 Wilcoxon T Tests on CLQA... | Emerald Publishing

Table 4

Wilcoxon T Tests on CLQA accuracy of 7 GPLLMs with and without CLKR in PCEQEs

No	GPLLM	CLKR	Average accuracy	Accuracy enhancement	z-statistic	p-value
1	Llama-2-70b	without	0.283	28.3%	4.197	0.000***
1	Llama-2-70b	with	0.363	28.3%	4.197	0.000***
2	text-davinci-003	without	0.329	44.9%	4.286	0.000***
2	text-davinci-003	with	0.476	44.9%	4.286	0.000***
3	GPT-3.5 Turbo	without	0.349	36.3%	4.287	0.000***
3	GPT-3.5 Turbo	with	0.476	36.3%	4.287	0.000***
4	GPT-4	without	0.528	25.4%	4.171	0.000***
4	GPT-4	with	0.663	25.4%	4.171	0.000***
5	ChatGLM2-6B	without	0.430	11.1%	3.729	0.000***
5	ChatGLM2-6B	with	0.478	11.1%	3.729	0.000***
6	ERNIE-Bot-turbo	without	0.419	10.2%	3.429	0.002***
6	ERNIE-Bot-turbo	with	0.462	10.2%	3.429	0.002***
7	ERNIE-Bot 4.0	without	0.755	9.9%	4.029	0.000***
7	ERNIE-Bot 4.0	with	0.830	9.9%	4.029	0.000***
Average accuracy of 7 GPLLMs		without	0.442	21.1%	NA	NA
Average accuracy of 7 GPLLMs		with	0.535	21.1%	NA	NA

No	GPLLM	CLKR	Average accuracy	Accuracy enhancement	z-statistic	p-value
1	Llama-2-70b	without	0.283	28.3%	4.197	0.000***
1	Llama-2-70b	with	0.363	28.3%	4.197	0.000***
2	text-davinci-003	without	0.329	44.9%	4.286	0.000***
2	text-davinci-003	with	0.476	44.9%	4.286	0.000***
3	GPT-3.5 Turbo	without	0.349	36.3%	4.287	0.000***
3	GPT-3.5 Turbo	with	0.476	36.3%	4.287	0.000***
4	GPT-4	without	0.528	25.4%	4.171	0.000***
4	GPT-4	with	0.663	25.4%	4.171	0.000***
5	ChatGLM2-6B	without	0.430	11.1%	3.729	0.000***
5	ChatGLM2-6B	with	0.478	11.1%	3.729	0.000***
6	ERNIE-Bot-turbo	without	0.419	10.2%	3.429	0.002***
6	ERNIE-Bot-turbo	with	0.462	10.2%	3.429	0.002***
7	ERNIE-Bot 4.0	without	0.755	9.9%	4.029	0.000***
7	ERNIE-Bot 4.0	with	0.830	9.9%	4.029	0.000***
Average accuracy of 7 GPLLMs		without	0.442	21.1%	NA	NA
Average accuracy of 7 GPLLMs		with	0.535	21.1%	NA	NA

Note(s): *** denote confidence levels above 99%

Source(s): Authors’ own work