Figure 6 The figure consists of seven...

Figure 6

The figure consists of seven subplots labeled (a) through (g), each showing paired distributions of model accuracy before and after integrating the construction law knowledge repository (C L K R). All panels share the vertical axis “Accuracy”, ranging from 0.2 to 1.0 with an interval of 0.2, and a legend indicating violins and boxes depict the 25 percent–75 percent range of baseline performance with whiskers for “Min–Max”, a horizontal line for the median, diamonds for the mean, and dots for “Accuracy on each P C E Q E test paper”. A dashed horizontal line marks the “Passing Line (Accuracy equals 0.6)”, which runs right from the marking 0.6 on the vertical axis of each plot. Panel (a), titled “Accuracy of Llama‑2‑70 b with and without C L K R”, shows a distribution for “Llama‑2‑70 b” on the left with a mean of “0.283” and a distribution for “Llama‑2‑70 b with C L K R” on the right with a mean of “0.363”. An arrow labeled “28.3 percent” points from the baseline mean to the C L K R‑enhanced mean, indicating modest improvement that remains below the 0.6 passing line. Panel (b), “Accuracy of text‑davinci‑003 with and without C L K R”, similarly shows the baseline mean “0.329” increasing to “0.476” with C L K R, annotated as “44.9 percent”. Panel (c), “Accuracy of G P T‑3.5 Turbo with and without C L K R”, presents a baseline mean of “0.349” rising to “0.476”, with a labeled improvement of “36.3 percent”. Panel (d), “Accuracy of G P T‑4 with and without C L K R”, shows the highest accuracies, with the baseline mean “0.528” already near the passing line and the C L K R‑empowered mean “0.663” clearly above it, corresponding to a “25.4 percent” gain. Panel (e), titled “Accuracy of Chat G L M 2‑6 B with and without C L K R”, shows a distribution for “Chat G L M 2-6 B” on the left with a mean of “0.430” and a distribution for “Chat G L M 2‑6 B with C L K R” on the right with a mean of “0.478”. An arrow labeled “11.1 percent” points from the baseline mean to the C L K R‑enhanced mean, indicating modest improvement that still remains below the 0.6 passing line. Panel (f), “Accuracy of E R N I E‑Bot‑turbo with and without C L K R”, displays baseline accuracy centered around “0.419” and C L K R‑empowered accuracy around “0.462”, with a “10.2 percent” gain; both distributions also lie below the passing threshold. Panel (g), “Accuracy of E R N I E‑Bot 4.0 with and without C L K R”, shows the strongest performance: the baseline mean is “0.755”, already above 0.6, while the C L K R‑enhanced mean is “0.830”, with an improvement of “9.9 percent” and tighter clustering of blue points. In each subplot, the distributions on the right are not only shifted upward in mean and median but also cluster more tightly above the baseline, illustrating that incorporating C L K R consistently boosts model performance across all P C E Q E test papers for each of the four G P L L Ms. Note: All the numerical data values are approximated.

Performance of original and CLKR-empowered GPLLMs in PCEQEs. Source(s): Authors’ own work