Figure 8
Performance comparison of original and CLKR-empowered GPLLMs across C1-C8. Source(s): Authors’ own work Refer to the image caption for details.The figure is spread across seven labeled panels (a)–(g), each showing how a G P L L M performs in eight construction law knowledge areas after integrating the construction law knowledge repository (C L K R). All panels share the vertical axis “Accuracy”, ranging from 0.0 to 1.0 with an interval of 0.2, and a common legend: boxes indicate the 25 percent–75 percent range for accuracies across P C E Q E test papers, thin vertical lines show “Min–Max”, diamonds mark “Average accuracy”, red dots show “Accuracy of G P L L Ms on each P C E Q E test paper” without C L K R, and blue dots show “Accuracy of C L K R‑empowered G P L L M s on each P C E Q E test paper”. A green dashed line at 0.6 represents the “Passing Line (Accuracy equals 0.6)”. Panel (a), “Llama‑2‑70 b”, contains eight grouped boxplots labeled C 1 through C 8 along the horizontal axis. For each C‑area, red and blue point clouds cluster around the boxes, with arrows annotating relative improvements. It shows relatively low accuracies in all eight areas, with averages below the passing line. In C 1, mean accuracy rises from “0.287” without C L K R to “0.370” with C L K R, a “29.1 percent” gain. C 2 increases from “0.355” to “0.360” (“1.4 percent”), C 3 from “0.221” to “0.325” (“47.1 percent”), C 4 from “0.302” to “0.360” (“19.0 percent”), C 5 from “0.303” to “0.386” (“27.3 percent”), C 6 from “0.333” to “0.411” (“23.3 percent”), C 7 from “0.324” to “0.379” (“17.1 percent”), and C 8 from “0.314” to “0.405” (“29.1 percent”). Panel (b), “text‑davinci‑003”, shows higher starting accuracies and larger relative gains. In C 1, mean accuracy improves from “0.340” to “0.435” (“28.1 percent”), in C 2 from “0.320” to “0.582” (“82.3 percent”), in C 3 from “0.332” to “0.487” (“46.5 percent”), and in C 4 from “0.299” to “0.426” (“42.4 percent”). For C 5, the average rises from “0.348” to “0.548” (“49.8 percent”), for C 6 from “0.348” to “0.533” (“53.0 percent”), for C 7 from “0.366” to “0.531” (“45.2 percent”), and for C 8 from “0.332” to “0.463” (“39.2 percent”). Most C‑area averages with C L K R approach or exceed 0.5, though still near or below the 0.6 threshold. Panel (c), “G P T‑3.5 Turbo”, presents moderate baseline performance that benefits noticeably from C L K R. In C 1, the mean increases from “0.416” to “0.484” (“16.3 percent”), in C 2 from “0.317” to “0.489” (“54.2 percent”), in C 3 from “0.362” to “0.503” (“38.9 percent”), and in C 4 from “0.285” to “0.445” (“56.1 percent”). C 5 improves from “0.400” to “0.455” (“13.6 percent”), C 6 from “0.333” to “0.526” (“57.8 percent”), C 7 from “0.389” to “0.539” (“38.5 percent”), and C 8 from “0.345” to “0.414” (“19.9 percent”). While some enhanced averages approach the passing line, most remain slightly below 0.6. Panel (d), “G P T‑4”, shows the strongest overall performance. Baseline averages are already near or above 0.6 in many C‑areas and consistently rise with C L K R. In C 1, accuracy grows from “0.573” to “0.657” (“14.7 percent”), in C 2 from “0.523” to “0.742” (“41.7 percent”), in C 3 from “0.549” to “0.689” (“25.5 percent”), and in C 4 from “0.481” to “0.615” (“27.9 percent”). For C 5, the mean increases from “0.539” to “0.719” (“33.2 percent”), for C 6 from “0.546” to “0.685” (“25.6 percent”), for C 7 from “0.488” to “0.678” (“39.1 percent”), and for C 8 from “0.590” to “0.681” (“15.5 percent”), with all C L K R‑enhanced averages clearly above the passing line. Panel (e), “Chat G L M 2‑6 B”, shows moderate accuracies: in C 1 the mean rises from “0.436” to “0.455” (“4.2 percent” improvement), in C 2 from “0.349” to “0.436” (“25.0 percent”), in C 3 from “0.511” to “0.517” (“1.3 percent”), in C 4 from “0.368” to “0.448” (“17.0 percent”), in C 5 from “0.430” to “0.538” (“20.1 percent”), in C 6 from “0.490” to “0.527” (“7.5 percent”), in C 7 from “0.477” to “0.536” (“12.5 percent”), and in C 8 from “0.430” to “0.513” (“19.3 percent”). Most averages stay below 0.6 but trend upward with C L K R. Panel (f), “E R N I E‑Bot‑turbo”, exhibits higher but still mixed performance: average accuracy in C 1 increases from “0.440” to “0.485” (“10.3 percent”), in C 2 from “0.553” to “0.580” (“4.9 percent”), in C 3 from “0.408” to “0.440” (“7.9 percent”), in C 4 from “0.385” to “0.406” (“5.4 percent”), in C 5 from “0.438” to “0.522” (“19.3 percent”), in C 6 from “0.435” to “0.525” (“20.7 percent”), in C 7 from “0.446” to “0.463” (“1.4 percent”), and in C 8 from “0.452” to “0.511” (“10.5 percent”). Only a few C‑areas approach the 0.6 passing line. Panel (g), “E R N I E‑Bot 4.0”, shows consistently high accuracies above the passing line for all C‑areas. For C 1 the mean improves from “0.770” to “0.849” (“10.3 percent”), for C 2 from “0.729” to “0.843” (“15.7 percent”), for C 3 from “0.761” to “0.862” (“13.3 percent”), for C 4 from “0.737” to “0.778” (“5.6 percent”), for C 5 from “0.715” to “0.828” (“15.8 percent”), for C 6 from “0.771” to “0.819” (“6.3 percent”), for C7 from “0.747” to “0.820” (“9.7 percent”), and for C 8 from “0.819” to “0.896” (“9.4 percent”). The blue point clouds cluster near the top of the chart, indicating robust, high‑accuracy performance in every construction law area once C L K R is integrated. Note: All the numerical data values are approximated.

Performance comparison of original and CLKR-empowered GPLLMs across C1-C8. Source(s): Authors’ own work

or Create an Account

Close Modal
Close Modal