Figure 9 The chart plots “Accuracy” on...

Figure 9

The chart plots “Accuracy” on the vertical axis, ranging from 0.0 to 1.0 with an interval of 0.2, and lists six models along the horizontal axis: “Llama‑2‑70 B”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”. The legend at the top explains tow bars for “Accuracy of G P L L M s on 100 open‑ended questions” and “Accuracy of C L K R‑empowered G P L L M s on 100 open‑ended questions”. For each model, a bar for accuracy of G P L L M s appears on the left with a numeric accuracy printed inside, and a bar for accuracy of C L K R‑empowered G P L L M s on the right with a higher value; arrows and percentage labels above indicate the relative improvement. For Llama‑2‑70 B, the original accuracy is “0.407” and the C L K R‑empowered accuracy is “0.473”, corresponding to a “16.4 percent” gain. For G P T‑3.5 Turbo, accuracy increases from “0.213” to “0.317”, an improvement of “48.4 percent”. For G P T‑4, the bars rise from “0.277” to “0.407” with a “47.0 percent” gain. For Chat G L M 2‑6 B, accuracy improves from “0.210” to “0.277”, labeled “31.8 percent”. For E R N I E‑Bot‑turbo, accuracy rises from “0.467” to “0.510”, giving a “9.3 percent” increase. For E R N I E‑Bot 4.0, the values go from “0.483” to “0.527”, a “9.0 percent” gain. A caption beneath the axis notes these as “6 pairs of G P L L M s with and without C L K R”.

Performance comparison of GPLLMs with and without CLKR in 100 open-ended questions. Note: The API of text-davinci-003 model is no longer accessible, when the authors conduct the CLQA on the open-ended question set in Nov. 2024. Source(s): Authors’ own work