Figure 9
Performance comparison of GPLLMs with and without CLKR in 100 open-ended questions. Note: The API of text-davinci-003 model is no longer accessible, when the authors conduct the CLQA on the open-ended question set in Nov. 2024. Source(s): Authors’ own work Refer to the image caption for details.The chart plots “Accuracy” on the vertical axis, ranging from 0.0 to 1.0 with an interval of 0.2, and lists six models along the horizontal axis: “Llama‑2‑70 B”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”. The legend at the top explains tow bars for “Accuracy of G P L L M s on 100 open‑ended questions” and “Accuracy of C L K R‑empowered G P L L M s on 100 open‑ended questions”. For each model, a bar for accuracy of G P L L M s appears on the left with a numeric accuracy printed inside, and a bar for accuracy of C L K R‑empowered G P L L M s on the right with a higher value; arrows and percentage labels above indicate the relative improvement. For Llama‑2‑70 B, the original accuracy is “0.407” and the C L K R‑empowered accuracy is “0.473”, corresponding to a “16.4 percent” gain. For G P T‑3.5 Turbo, accuracy increases from “0.213” to “0.317”, an improvement of “48.4 percent”. For G P T‑4, the bars rise from “0.277” to “0.407” with a “47.0 percent” gain. For Chat G L M 2‑6 B, accuracy improves from “0.210” to “0.277”, labeled “31.8 percent”. For E R N I E‑Bot‑turbo, accuracy rises from “0.467” to “0.510”, giving a “9.3 percent” increase. For E R N I E‑Bot 4.0, the values go from “0.483” to “0.527”, a “9.0 percent” gain. A caption beneath the axis notes these as “6 pairs of G P L L M s with and without C L K R”.

Performance comparison of GPLLMs with and without CLKR in 100 open-ended questions. Note: The API of text-davinci-003 model is no longer accessible, when the authors conduct the CLQA on the open-ended question set in Nov. 2024. Source(s): Authors’ own work

or Create an Account

Close Modal
Close Modal