Figure 10
The power law distribution of CL knowledge-sourced documents for CLQA. Source(s): Authors’ own work Refer to the image caption for details.The figure contains two stacked plots that analyze how often individual construction law documents contribute knowledge chunks, each fitted with a power‑law curve and annotated with statistics. Panel (a), titled “Power law distribution of unranked frequency of knowledge chunk‑sourced documents”, plots “Unranked frequency of documents” on the vertical axis from 0 to “750” and “Knowledge chunk‑sourced documents” along the horizontal axis. Orange vertical bars represent the unranked frequency with a line labeled “Power curve fitting” overlaying them. The curve is steep at the left, where a few documents have very high frequencies, then quickly decays toward near zero as the document index increases, highlighting a long‑tailed distribution. An inset bar chart zooms in on “The top 10 frequent documents (unranked)”, showing counts labeled above each bar: “704”, “469”, “252”, “228”, “176”, “133”, “112”, “111”, “105”, and “100”, and document I D s along the bottom such as “C L D‑260, C L D‑380, C L D‑347, C L D‑007, C L D‑108, C L D‑003, C L D‑015, C L D‑339, C L D‑149, C L D‑258”. A table to the right titled “Power function” reports the fitted model “y equals a asterisk x to the b power” with parameters “a: 727.39 plus or minus 5.75”, “b: negative 0.85 plus or minus 0.01”, and “R‑Square: 0.98”. The equation “Unranked frequency equals the sum from i equals 1 to n of C subscript i” appears below the table. Panel (b), titled “Power law distribution of ranked frequency of knowledge chunk‑sourced documents”, uses the same horizontal axis, but the vertical axis is “Ranked frequency of documents”, ranging from 0 to 450. Bars show ranked frequencies with a power‑law line labeled “Power curve fitting”. Again, the distribution is highly skewed, with a few documents dominating. The inset bar chart notes, “The top 10 frequent documents (ranked) provide 37.3 percent of the question‑related knowledge”, and displays frequencies above each bar: “428”, “298”, “145”, “138”, “100”, “82.8”, “73.7”, “68.7”, “67”, and “62.8”, for the same leading document I D s. The adjacent table lists the ranked‑frequency power‑law parameters: “y equals a asterisk x to the b power”, with “a: 444.75 plus or minus 3.96”, “b: negative 0.85 plus or minus 0.01”, and “R‑Square: 0.98”. A formula at the bottom reads, “Ranked frequency equals the sum from i equals 1 to n of C subscript i times 1 over R subscript i”.

The power law distribution of CL knowledge-sourced documents for CLQA. Source(s): Authors’ own work

or Create an Account

Close Modal
Close Modal