Building a construction law knowledge repository to enhance general-purpose large language model performance on domain question-answering: a case of China

https://doi.org/10.1061/(asce)la.1943-4170.0000543

Alhyari

O.H.

and

Ani

A.R.A.

(

2022

), “

Is the engineering and construction contract legally less competitive than the red book in civil law countries?

”,

Journal of Legal Affairs and Dispute Resolution in Engineering and Construction

, Vol.

No.

, 06522001, doi:

https://doi.org/10.1016/j.xops.2023.100324

Antaki

Touma

Milad

El-Khoury

and

Duval

(

2023

), “

Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings

”,

Ophthalmology Science

, Vol.

No.

, 100324, doi:

https://doi.org/10.4103/njpt.njpt_47_23

Badyal

D.K.

Jain

Lata

and

Sharma

(

2023

), “

Triple Cs of scenario-based multiple-choice question: concept, construction, and corroboration

”,

National Journal of Pharmacology and Therapeutics

, Vol.

No.

, pp.

117

122

, doi:

https://doi.org/10.2139/ssrn.4335905

Choi

J.H.

Hickman

K.E.

Monahan

A.B.

and

Schwarcz

D.B.

(

2023

), “

ChatGPT goes to law school

”,

SSRN Electronic Journal

, doi:

https://doi.org/10.1016/j.engappai.2023.107744

Chou

J.-S.

Chong

P.-L.

and

Liu

C.-Y.

(

2024

), “

Deep learning-based chatbot by natural language processing for supportive risk management in river dredging projects

”,

Engineering Applications of Artificial Intelligence

, Vol.

131

, 107744, doi:

https://github.com/deepset-ai/haystack

deepset-ai

(

2024

), “

Haystack

”,

available at:

(

2023

), “

A statistical analysis of lawyer and grassroots legal service work in 2022

”,

available at:

https://www.moj.gov.cn/pub/sfbgw/zwxxgk/fdzdgknr/fdzdgknrtjxx/202306/t20230614_480740.html

Domengo

(

2024

), “

PythonProject

”,

available at:

https://github.com/Domengo/pythonProject/blob/master/llm-chat/langchain_gemini_qa.py

Eleliemy

and

Ciorba

F.M.

(

2021

), “

A distributed chunk calculation approach for self-scheduling of parallel applications on distributed-memory systems

”,

Journal of Computational Science

, Vol.

, 101284, doi:

https://doi.org/10.1016/j.jocs.2020.101284

https://doi.org/10.48550/arXiv.2301.13867

Frieder

Pinchetti

Chevalier

Griffiths

R.-R.

Salvatori

Lukasiewicz

Petersen

P.C.

and

Berner

(

2023

), “

Mathematical capabilities of chatgpt

”,

arXiv

, pp.

, doi:

https://doi.org/10.48550/arXiv.2312.10997

Gao

Xiong

Gao

Jia

Pan

Dai

Sun

Wang

and

Wang

(

2023

), “

Retrieval-augmented generation for large language models: a survey

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/j.amjms.2023.08.001

Gencer

and

Aydin

(

2023

), “

Can ChatGPT pass the thoracic surgery exam?

”,

The American Journal of the Medical Sciences

, Vol.

366

No.

, pp.

291

295

, doi:

https://doi.org/10.3390/buildings14010220

Ghimire

Kim

and

Acharya

(

2024

), “

Opportunities and challenges of generative AI in construction industry: focusing on adoption of text-based models

”,

Buildings

, Vol.

No.

, p.

220

, doi:

https://doi.org/10.2196/45312

Gilson

Safranek

C.W.

Huang

Socrates

Chi

Taylor

R.A.

and

Chartash

(

2023

), “

How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment

”,

JMIR Medical Education

, Vol.

No.

, e45312, doi:

https://www.thecanadianencyclopedia.ca/en/article/building-codes-and-regulations (

Hansen

(

2013

), “Building Codes and Regulations”, In

The Canadian Encyclopedia

available at:

accessed

16 December 2013).

Harvel

Haiek

F.B.

Ankolekar

and

Brunner

D.J.

(

2024

), “

Can LLMs answer investment banking questions? Using domain-tuned functions to improve LLM performance on knowledge-intensive analytical tasks

”,

Proceedings of the AAAI Symposium Series

, Vol.

No.

, pp.

125

133

, doi:

https://doi.org/10.1609/aaaiss.v3i1.31191

https://doi.org/10.1101/2024.09.11.24313513

Hou

and

Zhang

(

2024

), “

Enhancing dietary supplement question answer via retrieval-augmented generation (RAG) with LLM

”,

medRxiv

, pp.

, doi:

https://doi.org/10.1177/1096348020985212

Hwang

and

Mattila

A.S.

(

2022

), “

The effect of smart shopper self-perceptions on word-of-mouth behaviors in the loyalty reward program context

”,

Journal of Hospitality & Tourism Research

, Vol.

No.

, pp.

243

266

, doi:

https://github.com/infiniflow/ragflow

infiniflow

(

2024

), “

Ragflow

”,

available at:

Khademi Adel

Modir

and

Ravanshadnia

(

2022

), “

An analytical review of construction law research

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

1931

1945

, doi:

https://doi.org/10.1108/ecam-05-2020-0306

https://www.cnet.com/tech/services-and-software/microsoft-copilot-embraces-the-power-of-openais-new-gpt-4-o/

Khan

(

2024

), “

Microsoft’s copilot embraces the power of openAI’s new GPT-4o

”,

available at:

https://doi.org/10.1061/jcemd4.coeng-14273

Kim

Chung

and

Chi

(

2024

), “

Cross-lingual information retrieval from multilingual construction documents using pretrained language models

”,

Journal of Construction Engineering and Management

, Vol.

150

No.

, 04024041, doi:

https://doi.org/10.1103/physrevphyseducres.19.010132

Kortemeyer

(

2023

), “

Could an artificial-intelligence agent pass an introductory physics course?

”,

Physical Review Physics Education Research

, Vol.

No.

, 010132, doi:

https://github.com/langchain-ai/langchain

Langchain-ai

(

2024

), “

Langchain

”,

available at:

Lee

S.-H.

Choi

S.-W.

and

Lee

E.-B.

(

2023

), “

A question-answering model based on knowledge graphs for the general provisions of equipment purchase orders for steel plants maintenance

”,

Electronics

, Vol.

No.

, p.

2504

, doi:

https://doi.org/10.3390/electronics12112504

and

Zeng

(

2021

Construction Engineering Regulations

Nanjing University Press

Nanjing

https://doi.org/10.1016/j.ijproman.2010.01.012

Liu

J.Y.

and

Low

S.P.

(

2011

), “

Work–family conflicts experienced by project managers in the Chinese construction industry

”,

International Journal of Project Management

, Vol.

No.

, pp.

117

128

, doi:

https://doi.org/10.3390/ijerph18137040

Liu

Mao

and

Yuan

(

2021

), “

Risk perception and coping behavior of construction workers on occupational health risks—A case study of Nanjing, China

”,

International Journal of Environmental Research and Public Health

, Vol.

No.

, p.

7040

, doi:

https://doi.org/10.1016/j.ject.2024.08.007

Liu

Meng

and

Yang

(

2024

), “

LLM technologies and information search

”,

Journal of Economy and Technology

, Vol.

, pp.

269

277

, doi:

https://doi.org/10.1016/j.enbenv.2024.03.010

Tian

Zhang

Zhao

Zhang

Feng

Wang

and

(

2024

In press

), “

Evaluation of large language models (LLMs) on the mastery of knowledge and skills in the heating, ventilation and air conditioning (HVAC) industry

”,

Energy and Built Environment

, doi:

https://doi.org/10.3390/bdcc8090115

Mansurova

and

Nugumanova

(

2024

), “

QA-RAG: exploring LLM reliance on external knowledge

”,

Big Data and Cognitive Computing

, Vol.

No.

, p.

115

, doi:

https://doi.org/10.1016/j.cosrev.2023.100552

Martinez-Gil

(

2023

), “

A survey on legal question–answering systems

”,

Computer Science Review

, Vol.

, 100552, doi:

https://huggingface.co/docs/transformers/main/model_doc/llama2

meta-llama

(

2024

), “

Llama2

”,

available at:

neuml

(

2024

), “

Txtai

”,

available at:

https://github.com/neuml/txtai

NPC

(

2024

), “

National legal database

”,

available at:

https://flk.npc.gov.cn/index.html

Oeding

J.F.

A.Z.

Mazzucco

M.C.

Taylor

S.A.

Dines

D.M.

Warren

R.F.

Gulotta

L.V.

Dines

J.S.

and

Kunze

K.N.

(

2024

), “

ChatGPT-4 performs clinical information retrieval tasks using consistently more trustworthy resources than does Google search for queries concerning the latarjet procedure

”,

Arthroscopy: The Journal of Arthroscopic & Related Surgery

, Vol.

No.

, pp.

588

597

, doi:

https://doi.org/10.1016/j.arthro.2024.05.025

https://doi.org/10.4174/astr.2023.104.5.269

Choi

G.-S.

and

Lee

W.Y.

(

2023

), “

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

”,

Annals of Surgical Treatment and Research

, Vol.

104

No.

, p.

269

, doi:

https://github.com/pathwaycom/llm-app

pathwaycom

(

2024

), “

llm-app

”,

available at:

Pei

Liu

Chen

and

Zhang

(

2024

), “

Application of large language models and assessment of their ship-handling theory knowledge and skills for connected maritime autonomous surface ships

”,

Mathematics

, Vol.

No.

, p.

2381

, doi:

https://doi.org/10.3390/math12152381

https://sebastian-petrus.medium.com/top-10-rag-frameworks-github-repos-2024-12b2a81f4a49

Petrus

(

2024

), “

Top 10 RAG frameworks Github Repos 2024

”,

available at:

https://doi.org/10.1016/j.caeai.2023.100183

Pursnani

Sermet

Kurt

and

Demir

(

2023

), “

Performance of ChatGPT on the US fundamentals of engineering exam: comprehensive assessment of proficiency and potential implications for professional environmental engineering practice

”,

Computers and Education: Artificial Intelligence

, Vol.

, 100183, doi:

https://doi.org/10.48550/arXiv.1606.05250

Rajpurkar

Zhang

Lopyrev

and

Liang

(

2016

), “

Squad: 100,000+ questions for machine comprehension of text

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/j.nlp.2024.100083

Rasool

Kurniawan

Balugo

Barnett

Vasa

Chesser

Hampstead

B.M.

Belleville

Mouzakis

and

Bahar-Fuchs

(

2024

), “

Evaluating LLMs on document-based QA: exact answer selection and numerical extraction using CogTale dataset

”,

Natural Language Processing Journal

, Vol.

, 100083, doi:

https://doi.org/10.1016/j.jor.2023.11.056

Rizzo

M.G.

Cai

and

Constantinescu

(

2024

), “

The performance of ChatGPT on orthopaedic in-service training exams: a comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education

”,

Journal of Orthopaedics

, Vol.

, pp.

, doi:

https://doi.org/10.1038/s41598-023-46995-z

Rosól

Gąsior

J.S.

Łaba

Korzeniewski

and

Młyńczak

(

2023

), “

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish medical final examination

”,

Scientific Reports

, Vol.

No.

, 20512, doi:

https://github.com/RUC-NLPIR/FlashRAG

RUC-NLPIR

(

2024

), “

FlashRAG

”,

available at:

Saad

Iyengar

K.P.

Kurisunkal

and

Botchu

(

2023

), “

Assessing chatgpt's ability to pass the FRCS orthopaedic part A exam: a critical analysis

”,

The Surgeon

, Vol.

No.

, pp.

263

266

, doi:

https://doi.org/10.1016/j.surge.2023.07.001

https://doi.org/10.1016/j.compbiomed.2023.107807

Sahin

M.C.

Sozer

Kuzucu

Turkmen

Sahin

M.B.

Sozer

Tufek

O.Y.

Nernekli

Emmez

and

Celtikci

(

2024

), “

Beyond human in neurosurgical exams: chatgpt's success in the Turkish neurosurgical society proficiency board exams

”,

Computers in Biology and Medicine

, Vol.

169

, 107807, doi:

https://doi.org/10.1016/j.dibe.2023.100300

Saka

Taiwo

Saka

Salami

B.A.

Ajayi

Akande

and

Kazemi

(

2024

), “

GPT models in construction industry: opportunities, limitations, and a use case validation

”,

Developments in the Built Environment

, Vol.

, 100300, doi:

https://doi.org/10.48550/arXiv.2311.07994

Sasazawa

Yokote

Imaichi

and

Sogawa

(

2023

), “

Text retrieval with multi-stage Re-Ranking models

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/s0302-2838(24)00759-0

Schoch

Schmelz

H.U.

Borgmann

and

Nestler

(

2024

), “

A0179 - performance of ChatGPT on the fellow of the European Board of Urology (FEBU) exams: a comparative analysis

”,

European Urology

, Vol.

, pp.

S923

S924

, doi:

https://doi.org/10.1016/j.jbi.2023.104285

Shi

Wang

Ren

ValizadehAslani

Zhang

and

Liang

(

2023

), “

Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment

”,

Journal of Biomedical Informatics

, Vol.

138

, 104285, doi:

https://www.chinacourt.org/article/detail/2023/03/id/7178559.shtml

SPC

(

2023

), “

The total number of cases accepted by courts nationwide exceeded 33 million in 2022

”,

available at:

M.-C.

Lin

L.-E.

Lin

L.-H.

and

Chen

Y.-C.

(

2024

), “

Assessing question characteristic influences on ChatGPT’s performance and response-explanation consistency: insights from Taiwan’s nursing licensing exam

”,

International Journal of Nursing Studies

, Vol.

153

, 104717, doi:

https://doi.org/10.1016/j.ijnurstu.2024.104717

https://doi.org/10.1016/j.autcon.2019.103048

Sun

Lei

Cao

Zhong

Wei

and

Yang

(

2020

), “

Text visualization for construction document information management

”,

Automation in Construction

, Vol.

111

, 103048, doi:

https://doi.org/10.3390/buildings12050531

Tao

Feng

and

Zhang

(

2022

), “

Reducing construction dust pollution by planning construction site layout

”,

Buildings

, Vol.

No.

, p.

531

, doi:

https://doi.org/10.1016/j.autcon.2022.104670

Tian

Ren

Zhang

Han

and

Shen

(

2023

), “

Intelligent question answering method for construction safety hazard knowledge based on deep semantic mining

”,

Automation in Construction

, Vol.

145

, 104670, doi:

https://github.com/truefoundry/cognita

truefoundry

(

2024

), “

Cognita

”,

available at:

Tsoutsanis

and

Tsoutsanis

(

2024

), “

Evaluation of large language model performance on the multi-specialty recruitment assessment (MSRA) exam

”,

Computers in Biology and Medicine

, Vol.

168

, 107794, doi:

https://doi.org/10.1016/j.compbiomed.2023.107794

https://doi.org/10.1016/j.autcon.2022.104696

Wang

and

El-Gohary

(

2023

), “

Deep learning-based relation extraction and knowledge graph-based representation of construction safety requirements

”,

Automation in Construction

, Vol.

147

, 104696, doi:

https://doi.org/10.1016/j.neucom.2024.127308

Wang

Xiong

Liu

Zhang

and

(

2024

), “

A self-adaptive ensemble for user interest drift learning

”,

Neurocomputing

, Vol.

577

, 127308, doi:

https://doi.org/10.1108/ecam-05-2023-0512

Liang

Guo

Meng

Zhou

and

Zhang

(

2023

), “

Entity recognition in the field of coal mine construction safety based on a pre-training language model

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

2590

2613

, doi:

https://doi.org/10.1016/j.autcon.2024.105730

Xue

Zhang

and

Chen

(

2024

), “

Question-answering framework for building codes using fine-tuned and distilled pre-trained transformer models

”,

Automation in Construction

, Vol.

168

, 105730, doi:

https://doi.org/10.48550/arXiv.2401.06782

Liu

Lin

Zhao

and

Che

(

2024

), “

Semantic similarity matching for patent documents using ensemble BERT-Related model and novel text processing method

”,

arXiv

, pp.

, doi:

https://doi.org/10.1080/10803548.2022.2115657

Zailani

Moda

Ibrahim

and

Abubakar

(

2023

), “

Improving the antecedents of non-compliance to safety regulations toward an optimized self-regulated construction environment in Nigeria

”,

International Journal of Occupational Safety and Ergonomics

, Vol.

No.

, pp.

1212

1219

, doi:

Zheng

Chiang

W.-L.

Sheng

Zhuang

Lin

and

Xing

(

2023

), “

Judging llm-as-a-judge with mt-bench and chatbot arena

”,

Advances in Neural Information Processing Systems

, Vol.

, pp.

46595

46623

https://doi.org/10.1016/j.aei.2019.101003

Zhong

Xing

Luo

Zhou

Rose

and

Fang

(

2020

), “

Deep learning-based extraction of construction procedural constraints from construction regulations

”,

Advanced Engineering Informatics

, Vol.

, 101003, doi:

https://doi.org/10.1108/ecam-02-2023-0172

Zhou

Gao

Tang

and

Wang

(

2023

), “

Intelligent detection on construction project contract missing clauses based on deep learning and NLP

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

1546

1580

, doi:

https://doi.org/10.1007/978-3-031-53308-2_8

Zhu

Wang

Okumura

and

Yang

(

2024

), “

MRHF: multi-stage retrieval and hierarchical fusion for textbook question answering

”,

International Conference on Multimedia Modeling

, pp.

111

, doi:

The supplementary material for this article can be found online.

2025

Shenghua Zhou, Hongyu Wang, S. Thomas Ng, Dezhi Li, Shenming Xie, Kaiwen Chen and Wentao Wang

Figure 1

The illustration is a multi‑row flowchart describing a four‑stage process for creating and evaluating a construction law knowledge repository (C L K R) used with general‑purpose legal language models (G P L L M s). Stage 1, on the top left, is titled “1. Collecting corpora and Recognizing candidate documents for C L K R”. Inside this stage are three sequential boxes connected by right‑pointing arrows. The first box is labeled “Gather corpora that contain construction laws” with a subtitle “Source: China Judgments Online”. The second box is labeled “Recognize document name entities in corpora”, with the line “Identifiers: Guillemets (left and right double angle beackets)”. The third box is a wider rectangle titled “Cleanse the document name entities” and internally split into three vertical panels labeled from left to right: “Merging identical entities”, “removing low‑frequency items”, and “removing non‑law documents”. Stage 2, on the top right, is titled “2. Identifying C L documents & Building the C L K R”. It contains three more boxes connected by arrows. The first box reads “Filter and align the candidate document entities” with a lower caption “Majority voting by 10 experts” above a row of stylized human icons. The next box is labeled “Clarify the structures of C L knowledge areas” with the note “Referring to a textbook and reviewed by experts”. The final box is titled “Categorize C L documents and collect the document contents” with the line “Collecting from Chinese Laws and Regulations Database”. Stage 3 in the middle row is titled “Incorporating C L K R into G P L L Ms for C L Q A” and contains three main boxes connected by right‑pointing arrows. The first box is labeled “Split C L documents into knowledge chunks”, and inside it two lines of text read “Chunk size equals 250 and Overlap equals 50” and “Chunk vectorization by most suitable embedding model of each L L M”. The second box is labeled “Retrieve question‑relevant knowledge chunks” and contains the line “Extracting 3 closest knowledge chunks (I) with minimum squared Euclidean distance (L subscript i squared):” followed by the formula “L subscript i squared equals open double mode V subscript i superscript knowledge minus V superscript question close double mode” and the selection rule “I equals arg min subscript 3 ({L subscript i squared} superscript N subscript i equals 1)”. The third box is titled “Input the combined question and retrieved knowledge into 7 selected G P L L M s” and includes a bulleted list labeled “Selecting G P L L Ms using three criteria:” with three bullets: “Inclusion of both open‑source and end‑source G P L L M s”, “Prioritization of G P L L Ms with superior performance”, and “Supporting automated batch Q A”. Stage 4 in the bottom row is titled “Validating the effectiveness of C L K R” and also contains three boxes joined by arrows. The first box reads “Devise a validation set for C L Q A” with a description: “Deciding question type and size by referring to existing literature in Table 2”. The second box is labeled “Compare performance differences between G P L L M s with and without C L K R” and states “Calculated by Accuracy and tested by Wilcoxon T Test”, followed by the formula “Accuracy equals (M superscript M S Q plus M superscript M M Q) over (1 times N superscript M S Q plus 2 times N superscript M M Q)”. The third box is titled “Evaluate individual C L document’s impact on performance enhancement” and explains “Evaluated by Unranked frequency and Ranked frequency” with two equations: “Unranked frequency equals sum from i equals 1 to n of C subscript i” and “Ranked frequency equals sum from i equals 1 to n of C subscript i times 1 over R subscript i”. Arrows connect all boxes from left to right and top to bottom, visually tracing the workflow from data collection through knowledge integration and quantitative validation.

The phases of building a CLKR to lift the CLQA performance of GPLLMs. Source(s): Authors’ own work

Figure 1

The phases of building a CLKR to lift the CLQA performance of GPLLMs. Source(s): Authors’ own work

Figure 2

A diagram shows how construction law documents are selected and categorized into various knowledge areas and subareas.

The figure is a large multi‑panel diagram illustrating data‑driven recognition and expert‑based determination of construction law (C L) documents for the C L K R. The top band, labeled on the right “(a) Data‑driven recognition of candidate documents for C L K R”, shows a three‑step pipeline. The first box on the left reads “Collect corpora containing C L documents from China Judgements Online”, and beneath it, within a rounded rectangle, “374,992 written judgments” with a small screenshot icon. A right‑pointing arrow leads to the second box “Recognize document name entities by guillemets (left and right double angle brackets)” with a count “775,241 document name entities”. Another arrow leads to “Cleanse the identified document name entities with three criteria” with three stacked numbers: “7,954 non‑duplicate items”, “1,018 items with no less than 5 appearances”, and “702 candidate documents that end with 10 specific terms”. A downward label “Provide 702 candidate documents” points to the lower half of the figure. The bottom half, marked on the right as “(b) Expertise‑based determination of C L documents in C L K R”, is split into three main text boxes at the top and a large visualization underneath. From left to right, the boxes state: “Filter and align the 702 candidate documents”, “Clarify the 8 C L knowledge areas and 164 C L knowledge subareas”, and “Categorize 387 C L documents into 164 distinct C L K subareas (Table S 2)”. Below the first box is a grey panel headed “387 C L documents for C L K R” containing green rounded rectangles labeled “C L D‑001”, “C L D‑002”, “C L D‑003”, “C L D‑004”, “C L D‑005”, “C L D‑006”, followed by dotted ellipsis dots and ending with “C L D‑386” and “C L D‑387”. The center of the figure depicts “C L Knowledge” as a circle feeding eight colored second‑layer areas labeled “C 1: Basic Legal Knowledge for Construction”, “C 2: Construction Permits”, “C 3: Contracting and Subcontracting”, “C 4: Construction Project Contracts and Labor Contracts”, “C 5: Environment and Cultural Heritage Protection”, “C 6: Construction Safety”, “C 7: Construction Quality”, and “C 8: Dispute Resolution”. Each C‑area connects to a band of thinner third‑layer labels such as “C 1‑01 to C 1‑29”, “C 2‑01 to C 2‑14”, “C 3‑01 to C 3‑16”, “C 4‑01 to C 4‑23”, “C 5‑01 to C 5‑12”, “C 6‑01 to C 6‑25”, “C 7‑01 to C 7‑23”, and “C 8‑01 to C 8‑22”. From these subarea codes, many multicolored strands flow rightward into a tall rectangular block labeled on the side “387 C L documents in C L K R”, whose interior is filled with vertical green rectangles representing individual documents. At the bottom, a dashed “Legends” box explains icons: open rounded rectangles represent “Second‑layer C L K area”, narrow rounded rectangles indicate “Third‑layer C L K subarea”, and solid green bars denote “Construction law document”. To the right of the legend is a worked example titled “An example of C L knowledge in subarea C 3‑06”. It shows “C L K” leading to “C 3: Contracting and Subcontracting”, then to “C 3‑06: Statutory requirements for winning bids and handling complaints in bidding”, which in turn connects to three specific green document bars labeled: “C L D‑072: Opinions on Promoting the Sustainable and Healthy Development of the Construction Industry”, “C L D‑232: Tendering and Bidding Law of the People’s Republic of China”, and “C L D‑266: Regulations for the Implementation of Bidding and Tendering Law of the People’s Republic of China”. The entire diagram emphasizes the progression from hundreds of thousands of judgments to a curated set of 387 construction law documents structured into 8 knowledge areas and 164 subareas within the C L K R.

Building the CLKR by combining data-driven and expertise-based paradigm. Source(s): Authors’ own work

Figure 2

Building the CLKR by combining data-driven and expertise-based paradigm. Source(s): Authors’ own work

Figure 3

A schematic shows how C L K R documents are chunked, embedded, matched to user questions, and fed into multiple G P L L M s.

The figure is a three‑section workflow diagram describing construction‑law‑aware question answering. The top section, labeled “(a) Split 387 documents into knowledge chunks”, begins on the left with a vertical stack of rounded rectangles labeled “C L D‑001”, “C L D‑002”, ellipsis dots, and “C L D‑387”, collectively marked “C L K R”. An arrow labeled “Split” points to a box containing items “Knowledge chunk 1”, “Knowledge chunk 2”, ellipsis, and “Knowledge chunk n”. A second arrow labeled “Embed chunks” leads to a box listing “Knowledge vector 1”, “Knowledge vector 2”, ellipsis, and “Knowledge vector n”. A third arrow leads to a magenta box titled “F A I S S‑formatted vector repository” that shows three example high‑dimensional vectors, such as “0.101, negative 0.002, ellipsis, negative 0.400”, “negative 0.003, negative 0.902, ellipsis, negative 0.007”, ellipsis, and “negative 0.803, 0.005, ellipsis, negative 0.243”, under the heading “Store the vectors”. The middle section is titled “(b) Retrieve question‑relevant knowledge chunks”. On the left, a user icon labeled “User” speaks “Questions” in a bubble, and an arrow labeled “Vectorize” points to a box listing “Question vector 1”, “Question vector 2”, up to “Question vector m”. A horizontal arrow labeled “Compare” connects this box to a box describing retrieved knowledge: rows such as “Relevant knowledge chunk 1‑1 to 1‑3”, “Relevant knowledge chunk 2‑1 to 2‑3”, ellipsis, and “Relevant knowledge chunk m‑1 to m‑3”. A side note states, “Retrieve 3 relevant knowledge chunks for each question”, with a loop arrow back up to the F A I S S vector repository, indicating a similarity search between question vectors and stored knowledge vectors. The bottom section is labeled “Combine question and knowledge” on the left and “(c) Input the combined question and retrieved knowledge into G P L L Ms” on the right. Two large ovals represent batched inputs: the top oval shows “Question 1” paired with “Relevant knowledge chunk 1‑1”, “Relevant knowledge chunk 1‑2”, and “Relevant knowledge chunk 1‑3”, while the bottom oval shows “Question m” with “Relevant knowledge chunk m‑1”, “Relevant knowledge chunk m‑2”, and “Relevant knowledge chunk m‑3”. Arrows labeled “Input” point from these ovals to a vertical list titled “General‑purpose large language model (G P L L M)”, which enumerates specific models with icons: “Llama‑2‑70 b”, “Text‑davinci‑003”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”. On the far right, a bubble labeled “Answers” indicates the generated outputs.

The process of leveraging CLKR to empower GPLLM for CLQA. Source(s): Authors’ own work

Figure 3

The process of leveraging CLKR to empower GPLLM for CLQA. Source(s): Authors’ own work

Figure 4

A schematic summarizes tagged exam questions by test paper, question type, and C L K area.

The figure is divided into three dashed boxes labeled “Test paper tag”, “Question type tag”, and “C L K area tag”, showing how exam questions are categorized. On the left, under “Test paper tag”, the top half states “1100 questions from 11 test papers of first‑level P C E Q E” above a four-by-three grid of document icons captioned with years: “2014”, “2015”, “2016”, “2017”, “2018”, “2019”, “2020”, “2021”, “2022”, “2022 asterisk”, and “2023”. The bottom half reads “1040 questions from 13 test papers of second‑level P C E Q E” above another grid of document icons labeled “2014”, “2015”, “2016”, “2017”, “2018”, “2019”, “2020”, “2020 asterisk”, “2021”, “2021 asterisk”, “2022”, “2022 asterisk”, and “2023”. Arrows from both groups point to the central box. The middle “Question type tag” box contains four stacked cylindrical icons. For the first‑level questions, the upper cylinder is labeled “770 multiple‑choice single‑answer questions (M S Q s)” and the second cylinder “330 multiple‑choice multiple‑answer questions (M M Q s)”. For the second‑level questions, the third cylinder is labeled “780 M S Q s” and the bottom cylinder “260 M M Q s”. A right‑pointing arrow from this box leads to the third box. The rightmost “C L K area tag” box lists how questions are distributed across eight construction‑law knowledge areas: “457 C 1‑related questions”, “120 C 2‑related questions”, “304 C 3‑related questions”, “397 C 4‑related questions”, “130 C 5‑related questions”, “261 C 6‑related questions”, “244 C 7‑related questions”, and “227 C 8‑related questions”.

The CLQA dataset. Note: * indicates an extra PCEQE hold that year. Source(s): Authors’ own work

Figure 5

A results diagram compares 7 original and C L K R‑empowered G P L L M s on 14,980 answers and summarizes performance gains.

The figure is a multi‑panel workflow summarizing evaluation outcomes for construction‑law question answering. On the left, a vertical box titled “7 G P L L M s” lists the models with icons: “Llama‑2‑70 b”, “Text‑davinci‑003”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”. A double-headed arrow labeled “Integration” points downward to a box “Construction law knowledge repository (C L K R)” containing a miniature of the earlier eight‑area knowledge diagram (C 1 to C 8 of “387 C L documents in C L K R”). From the models, an arrow leads to a central box stating “14,980 answers to 2,140 questions from 7 original G P L L M s” with “8,404 marks for 14,980 answers” beneath. A double-headed arrow labeled “Comparison” points from a dashed box below back to this box. The dashed box reads “14,980 answers generated by 7 C L K R‑empowered G P L L Ms” and adds “Retrieved knowledge chunks and similarity ranks”. Below the comparison arrow, another box notes “10,202.5 marks for 14,980 answers”, indicating improved scoring when using C L K R. To the right, a large rectangular panel titled “Performance comparison of each test paper slash on M S Q s or M M Q s slash across 8 C L areas slash on open‑ended questions (Section 4.1)” contains bullet points: “C L K R enhances the C L Q A performance of 7 G P L L M s by an average of 21.1 percent, varying from 9.9 percent to 44.9 percent (Table 4).” “C L Q A performance on M S Q s and M M Q s improves by 14.9 percent and 38.3 percent (Table 5).” “C L K R enhances C L Q A performance from 14.5 percent to 28.2 percent across eight C L knowledge areas (Table 6).” “C L K R enhances 7 G P L L M s’ performance in 100 open‑ended questions by an average of 22.0 percent (Table S 4).” A lower right panel titled “Impact evaluation of individual C L document on performance enhancement (Section 4.2)” provides further bullets: “Top 10 (2.6 percent) documents offer 37.2 percent (unranked) and 37.3 percent (ranked) knowledge for C L Q A (Figure 9).” “210 documents retrieved less than 5 times offer 3.7 percent (unranked) and 3.5 percent (ranked) knowledge for C L Q A (Table S 4).”

The comparison results of GPLLMs’ CLQA accuracies with and without CLKR. Source(s): Authors’ own work

Figure 6

Seven box‑and‑violin plots compare the accuracy of seven models with and without C L K R across P C E Q E test papers.

The figure consists of seven subplots labeled (a) through (g), each showing paired distributions of model accuracy before and after integrating the construction law knowledge repository (C L K R). All panels share the vertical axis “Accuracy”, ranging from 0.2 to 1.0 with an interval of 0.2, and a legend indicating violins and boxes depict the 25 percent–75 percent range of baseline performance with whiskers for “Min–Max”, a horizontal line for the median, diamonds for the mean, and dots for “Accuracy on each P C E Q E test paper”. A dashed horizontal line marks the “Passing Line (Accuracy equals 0.6)”, which runs right from the marking 0.6 on the vertical axis of each plot. Panel (a), titled “Accuracy of Llama‑2‑70 b with and without C L K R”, shows a distribution for “Llama‑2‑70 b” on the left with a mean of “0.283” and a distribution for “Llama‑2‑70 b with C L K R” on the right with a mean of “0.363”. An arrow labeled “28.3 percent” points from the baseline mean to the C L K R‑enhanced mean, indicating modest improvement that remains below the 0.6 passing line. Panel (b), “Accuracy of text‑davinci‑003 with and without C L K R”, similarly shows the baseline mean “0.329” increasing to “0.476” with C L K R, annotated as “44.9 percent”. Panel (c), “Accuracy of G P T‑3.5 Turbo with and without C L K R”, presents a baseline mean of “0.349” rising to “0.476”, with a labeled improvement of “36.3 percent”. Panel (d), “Accuracy of G P T‑4 with and without C L K R”, shows the highest accuracies, with the baseline mean “0.528” already near the passing line and the C L K R‑empowered mean “0.663” clearly above it, corresponding to a “25.4 percent” gain. Panel (e), titled “Accuracy of Chat G L M 2‑6 B with and without C L K R”, shows a distribution for “Chat G L M 2-6 B” on the left with a mean of “0.430” and a distribution for “Chat G L M 2‑6 B with C L K R” on the right with a mean of “0.478”. An arrow labeled “11.1 percent” points from the baseline mean to the C L K R‑enhanced mean, indicating modest improvement that still remains below the 0.6 passing line. Panel (f), “Accuracy of E R N I E‑Bot‑turbo with and without C L K R”, displays baseline accuracy centered around “0.419” and C L K R‑empowered accuracy around “0.462”, with a “10.2 percent” gain; both distributions also lie below the passing threshold. Panel (g), “Accuracy of E R N I E‑Bot 4.0 with and without C L K R”, shows the strongest performance: the baseline mean is “0.755”, already above 0.6, while the C L K R‑enhanced mean is “0.830”, with an improvement of “9.9 percent” and tighter clustering of blue points. In each subplot, the distributions on the right are not only shifted upward in mean and median but also cluster more tightly above the baseline, illustrating that incorporating C L K R consistently boosts model performance across all P C E Q E test papers for each of the four G P L L Ms. Note: All the numerical data values are approximated.

Performance of original and CLKR-empowered GPLLMs in PCEQEs. Source(s): Authors’ own work

Figure 6

Performance of original and CLKR-empowered GPLLMs in PCEQEs. Source(s): Authors’ own work

Figure 7

Seven panels of box‑and‑scatter plots compare accuracy of six G P L L Ms on M S Q s and M M Q s with and without C L K R.

The figure is composed of subplots (a)–(g), each showing how integrating the construction law knowledge repository (C L K R) affects accuracy on multiple‑choice single‑answer questions (M S Q s) and multiple‑choice multiple‑answer questions (M M Q s) for a given G P L L M. All panels share the vertical axis “Accuracy”, ranging from 0.0 to 1.0 with an interval of 0.2, and a legend indicating boxes for the 25 percent–75 percent range of baseline performance: red boxes for “without C L K R” and blue boxes for “with C L K R”, whiskers for “Min–Max”, a horizontal line for the median, diamonds for the mean, and dots for “Accuracy on each P C E Q E test paper”. A dashed horizontal line marks the “Passing Line (Accuracy equals 0.6)”, which runs right from the marking 0.6 on the vertical axis of each plot. Panels (a)–(g) cover “Llama‑2‑70 b”, “text‑davinci‑003”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”, respectively. In each, two pairs of boxplots appear along the horizontal axis labeled “without C L K R” and “with C L K R” under “M S Q s” and “M M Q s”. For Llama‑2‑70 b, the mean M S Q accuracy increases from “0.413” to “0.475” (a “15.0 percent” gain), while M M Q accuracy rises from “0.120” to “0.214” (a “78.7 percent” gain), though both remain below the 0.6 passing line. For text‑davinci‑003, M S Q mean accuracy improves from “0.404” to “0.567” (“40.4 percent”), and M M Q accuracy from “0.220” to “0.353” (“59.2 percent”). G P T‑3.5 Turbo shows M S Q accuracy increasing from “0.452” to “0.547” (“21.1 percent”) and M M Q accuracy from “0.205” to “0.381” (“86.2 percent”). G P T‑4 exhibits the high M S Q scores, with the mean rising from “0.645” to “0.743” (“15.2 percent”), surpassing the passing line; its M M Q accuracy grows from “0.374” to “0.557”, a “49.0 percent” gain that moves the distribution close to the 0.6 threshold. For Chat G L M 2‑6 B, M S Q accuracy slightly increases from “0.538” to “0.604”, labeled “12.4 percent” improvement in the figure, while M M Q accuracy increases from “0.293” to “0.311” (“6.2 percent”), both below the passing line. E R N I E‑Bot‑turbo’s M S Q mean accuracy improves from “0.680” to “0.731” (“7.5 percent”), remaining above 0.6, and M M Q accuracy rises from “0.071” to “0.103” (“44.6 percent”) though still low in absolute terms. E R N I E‑Bot 4.0 exhibits the highest M S Q scores, with the mean rising from “0.853” to “0.909” (“6.6 percent”), surpassing the passing line; its M M Q accuracy grows from “0.626” to “0.724”, a “15.5 percent” gain, also surpassing the passing 0.6 threshold. Note: All the numerical data values are approximated.

Performance comparison of original and CLKR-empowered GPLLMs in MSQs and MMQs. Source(s): Authors’ own work

Figure 7

Performance comparison of original and CLKR-empowered GPLLMs in MSQs and MMQs. Source(s): Authors’ own work

Figure 8

A seven-panel figure shows L L M accuracy across eight construction-law areas, with improvements after adding the C L K R.

The figure is spread across seven labeled panels (a)–(g), each showing how a G P L L M performs in eight construction law knowledge areas after integrating the construction law knowledge repository (C L K R). All panels share the vertical axis “Accuracy”, ranging from 0.0 to 1.0 with an interval of 0.2, and a common legend: boxes indicate the 25 percent–75 percent range for accuracies across P C E Q E test papers, thin vertical lines show “Min–Max”, diamonds mark “Average accuracy”, red dots show “Accuracy of G P L L Ms on each P C E Q E test paper” without C L K R, and blue dots show “Accuracy of C L K R‑empowered G P L L M s on each P C E Q E test paper”. A green dashed line at 0.6 represents the “Passing Line (Accuracy equals 0.6)”. Panel (a), “Llama‑2‑70 b”, contains eight grouped boxplots labeled C 1 through C 8 along the horizontal axis. For each C‑area, red and blue point clouds cluster around the boxes, with arrows annotating relative improvements. It shows relatively low accuracies in all eight areas, with averages below the passing line. In C 1, mean accuracy rises from “0.287” without C L K R to “0.370” with C L K R, a “29.1 percent” gain. C 2 increases from “0.355” to “0.360” (“1.4 percent”), C 3 from “0.221” to “0.325” (“47.1 percent”), C 4 from “0.302” to “0.360” (“19.0 percent”), C 5 from “0.303” to “0.386” (“27.3 percent”), C 6 from “0.333” to “0.411” (“23.3 percent”), C 7 from “0.324” to “0.379” (“17.1 percent”), and C 8 from “0.314” to “0.405” (“29.1 percent”). Panel (b), “text‑davinci‑003”, shows higher starting accuracies and larger relative gains. In C 1, mean accuracy improves from “0.340” to “0.435” (“28.1 percent”), in C 2 from “0.320” to “0.582” (“82.3 percent”), in C 3 from “0.332” to “0.487” (“46.5 percent”), and in C 4 from “0.299” to “0.426” (“42.4 percent”). For C 5, the average rises from “0.348” to “0.548” (“49.8 percent”), for C 6 from “0.348” to “0.533” (“53.0 percent”), for C 7 from “0.366” to “0.531” (“45.2 percent”), and for C 8 from “0.332” to “0.463” (“39.2 percent”). Most C‑area averages with C L K R approach or exceed 0.5, though still near or below the 0.6 threshold. Panel (c), “G P T‑3.5 Turbo”, presents moderate baseline performance that benefits noticeably from C L K R. In C 1, the mean increases from “0.416” to “0.484” (“16.3 percent”), in C 2 from “0.317” to “0.489” (“54.2 percent”), in C 3 from “0.362” to “0.503” (“38.9 percent”), and in C 4 from “0.285” to “0.445” (“56.1 percent”). C 5 improves from “0.400” to “0.455” (“13.6 percent”), C 6 from “0.333” to “0.526” (“57.8 percent”), C 7 from “0.389” to “0.539” (“38.5 percent”), and C 8 from “0.345” to “0.414” (“19.9 percent”). While some enhanced averages approach the passing line, most remain slightly below 0.6. Panel (d), “G P T‑4”, shows the strongest overall performance. Baseline averages are already near or above 0.6 in many C‑areas and consistently rise with C L K R. In C 1, accuracy grows from “0.573” to “0.657” (“14.7 percent”), in C 2 from “0.523” to “0.742” (“41.7 percent”), in C 3 from “0.549” to “0.689” (“25.5 percent”), and in C 4 from “0.481” to “0.615” (“27.9 percent”). For C 5, the mean increases from “0.539” to “0.719” (“33.2 percent”), for C 6 from “0.546” to “0.685” (“25.6 percent”), for C 7 from “0.488” to “0.678” (“39.1 percent”), and for C 8 from “0.590” to “0.681” (“15.5 percent”), with all C L K R‑enhanced averages clearly above the passing line. Panel (e), “Chat G L M 2‑6 B”, shows moderate accuracies: in C 1 the mean rises from “0.436” to “0.455” (“4.2 percent” improvement), in C 2 from “0.349” to “0.436” (“25.0 percent”), in C 3 from “0.511” to “0.517” (“1.3 percent”), in C 4 from “0.368” to “0.448” (“17.0 percent”), in C 5 from “0.430” to “0.538” (“20.1 percent”), in C 6 from “0.490” to “0.527” (“7.5 percent”), in C 7 from “0.477” to “0.536” (“12.5 percent”), and in C 8 from “0.430” to “0.513” (“19.3 percent”). Most averages stay below 0.6 but trend upward with C L K R. Panel (f), “E R N I E‑Bot‑turbo”, exhibits higher but still mixed performance: average accuracy in C 1 increases from “0.440” to “0.485” (“10.3 percent”), in C 2 from “0.553” to “0.580” (“4.9 percent”), in C 3 from “0.408” to “0.440” (“7.9 percent”), in C 4 from “0.385” to “0.406” (“5.4 percent”), in C 5 from “0.438” to “0.522” (“19.3 percent”), in C 6 from “0.435” to “0.525” (“20.7 percent”), in C 7 from “0.446” to “0.463” (“1.4 percent”), and in C 8 from “0.452” to “0.511” (“10.5 percent”). Only a few C‑areas approach the 0.6 passing line. Panel (g), “E R N I E‑Bot 4.0”, shows consistently high accuracies above the passing line for all C‑areas. For C 1 the mean improves from “0.770” to “0.849” (“10.3 percent”), for C 2 from “0.729” to “0.843” (“15.7 percent”), for C 3 from “0.761” to “0.862” (“13.3 percent”), for C 4 from “0.737” to “0.778” (“5.6 percent”), for C 5 from “0.715” to “0.828” (“15.8 percent”), for C 6 from “0.771” to “0.819” (“6.3 percent”), for C7 from “0.747” to “0.820” (“9.7 percent”), and for C 8 from “0.819” to “0.896” (“9.4 percent”). The blue point clouds cluster near the top of the chart, indicating robust, high‑accuracy performance in every construction law area once C L K R is integrated. Note: All the numerical data values are approximated.

Performance comparison of original and CLKR-empowered GPLLMs across C1-C8. Source(s): Authors’ own work

Figure 8

Performance comparison of original and CLKR-empowered GPLLMs across C1-C8. Source(s): Authors’ own work

Figure 9

A grouped bar chart compares the accuracy of six G P L L M s with and without C L K R on 100 open‑ended questions.

The chart plots “Accuracy” on the vertical axis, ranging from 0.0 to 1.0 with an interval of 0.2, and lists six models along the horizontal axis: “Llama‑2‑70 B”, “G P T‑3.5 Turbo”, “G P T‑4”, “Chat G L M 2‑6 B”, “E R N I E‑Bot‑turbo”, and “E R N I E‑Bot 4.0”. The legend at the top explains tow bars for “Accuracy of G P L L M s on 100 open‑ended questions” and “Accuracy of C L K R‑empowered G P L L M s on 100 open‑ended questions”. For each model, a bar for accuracy of G P L L M s appears on the left with a numeric accuracy printed inside, and a bar for accuracy of C L K R‑empowered G P L L M s on the right with a higher value; arrows and percentage labels above indicate the relative improvement. For Llama‑2‑70 B, the original accuracy is “0.407” and the C L K R‑empowered accuracy is “0.473”, corresponding to a “16.4 percent” gain. For G P T‑3.5 Turbo, accuracy increases from “0.213” to “0.317”, an improvement of “48.4 percent”. For G P T‑4, the bars rise from “0.277” to “0.407” with a “47.0 percent” gain. For Chat G L M 2‑6 B, accuracy improves from “0.210” to “0.277”, labeled “31.8 percent”. For E R N I E‑Bot‑turbo, accuracy rises from “0.467” to “0.510”, giving a “9.3 percent” increase. For E R N I E‑Bot 4.0, the values go from “0.483” to “0.527”, a “9.0 percent” gain. A caption beneath the axis notes these as “6 pairs of G P L L M s with and without C L K R”.

Performance comparison of GPLLMs with and without CLKR in 100 open-ended questions. Note: The API of text-davinci-003 model is no longer accessible, when the authors conduct the CLQA on the open-ended question set in Nov. 2024. Source(s): Authors’ own work

Figure 10

Two line‑and‑bar charts show power‑law distributions for unranked and ranked frequencies of knowledge chunks.

The figure contains two stacked plots that analyze how often individual construction law documents contribute knowledge chunks, each fitted with a power‑law curve and annotated with statistics. Panel (a), titled “Power law distribution of unranked frequency of knowledge chunk‑sourced documents”, plots “Unranked frequency of documents” on the vertical axis from 0 to “750” and “Knowledge chunk‑sourced documents” along the horizontal axis. Orange vertical bars represent the unranked frequency with a line labeled “Power curve fitting” overlaying them. The curve is steep at the left, where a few documents have very high frequencies, then quickly decays toward near zero as the document index increases, highlighting a long‑tailed distribution. An inset bar chart zooms in on “The top 10 frequent documents (unranked)”, showing counts labeled above each bar: “704”, “469”, “252”, “228”, “176”, “133”, “112”, “111”, “105”, and “100”, and document I D s along the bottom such as “C L D‑260, C L D‑380, C L D‑347, C L D‑007, C L D‑108, C L D‑003, C L D‑015, C L D‑339, C L D‑149, C L D‑258”. A table to the right titled “Power function” reports the fitted model “y equals a asterisk x to the b power” with parameters “a: 727.39 plus or minus 5.75”, “b: negative 0.85 plus or minus 0.01”, and “R‑Square: 0.98”. The equation “Unranked frequency equals the sum from i equals 1 to n of C subscript i” appears below the table. Panel (b), titled “Power law distribution of ranked frequency of knowledge chunk‑sourced documents”, uses the same horizontal axis, but the vertical axis is “Ranked frequency of documents”, ranging from 0 to 450. Bars show ranked frequencies with a power‑law line labeled “Power curve fitting”. Again, the distribution is highly skewed, with a few documents dominating. The inset bar chart notes, “The top 10 frequent documents (ranked) provide 37.3 percent of the question‑related knowledge”, and displays frequencies above each bar: “428”, “298”, “145”, “138”, “100”, “82.8”, “73.7”, “68.7”, “67”, and “62.8”, for the same leading document I D s. The adjacent table lists the ranked‑frequency power‑law parameters: “y equals a asterisk x to the b power”, with “a: 444.75 plus or minus 3.96”, “b: negative 0.85 plus or minus 0.01”, and “R‑Square: 0.98”. A formula at the bottom reads, “Ranked frequency equals the sum from i equals 1 to n of C subscript i times 1 over R subscript i”.

The power law distribution of CL knowledge-sourced documents for CLQA. Source(s): Authors’ own work

Figure 10

The power law distribution of CL knowledge-sourced documents for CLQA. Source(s): Authors’ own work

Figure 11

A two‑panel interface shows how the “Smart C L Q A” system uses C L K R knowledge chunks to answer an exam question.

The figure contains two annotated screenshots labeled “(a) A C L Q A example in the deployed prototype” and “(b) Updating C L documents in the C L K R”. In panel (a), at the very top, a green‑bordered text box shows an English multiple‑choice item: “According to the ‘Unified Standard for Construction Quality Acceptance of Building Engineering’, who is responsible for accepting the inspection lot for the energy‑saving work of main structures, as well as for accepting the concealed work?” followed by options “(A) Supervision Engineer (B) Project Manager (C) Quality Engineer (D) Chief Supervision Engineer” and a label “Translation of the question”. A green annotation on the left labels this region “The Question”. Beneath, on the right, a smaller box indicates the original Chinese exam item with the tag “Q 0 1 6 0 (sourced from 2022 asterisk first‑level P C E Q E)”. Below it is a dialogue bubble containing the Chinese version of the question and, highlighted in red, the system’s final output, “Answer: D. Chief Supervision Engineer”, called out by a red label on the left, “Answer generated by G P L L M s”. On the left side of the screenshot is the Smart C L Q A sidebar headed “Dialogue” with version text “Current version v 0.2.10”. Within the panel are controls: “Manage knowledge repository”, “Current session: default”, “Please select a dialogue mode: Knowledge repository chat”, “Please select a L L M model: chatglm2‑6 b (Running)”, “Please select a prompt template: default”, a “Temperature: 0.80” slider, and “Historical dialogue rounds: 3”. Color‑coded boxes label “Enable the C L K R”, “Select the G P L L Ms for C L Q A”, and at the bottom “3 knowledge chunks related to the question”, with a drop‑down “Knowledge repository configuration” set to “Construction law knowledge”. On the right, three blue‑outlined rectangles stacked vertically present the retrieved supporting texts. Each shows a document title link and a mix of Chinese characters with an English summary caption. “Knowledge Chunk 1” is described as “A 250‑token Knowledge Chunk from ‘C L D‑225: Standards for Quality Acceptance of Energy‑Efficient Building Construction’”. “Knowledge Chunk 2” reads “A 250‑token Knowledge Chunk from ‘C L D‑151: Unified Standard for Construction Quality Acceptance of Building Engineering’”. “Knowledge Chunk 3” reads “Another 250‑token C L knowledge chunk from ‘C L D‑151: Unified Standard for Construction Quality Acceptance of Building Engineering’”. Blue arrows from the sidebar emphasize that these three chunks are retrieved as evidence for the question. Panel (b) shows the same “Smart C L Q A” interface focused on repository management. A central grey button labeled “Manage knowledge repository” is highlighted with a turquoise callout “Manage the knowledge repository”. On the right, a form headed “Please select or create a knowledge repository” with a drop‑down “Construction law knowledge repository (C L K R) [alias c l g b e‑v m‑1.5]” allows users to “Upload knowledge files” via a magenta‑outlined area reading “Drag and drop a document here” and “Browse”, annotated “Add up‑to‑date documents to the C L K R”. Below, a table lists existing documents, including an entry “C L D‑108: Standards for Identifying Bad Behaviors of All Parties in the National Construction Market”, which is highlighted in yellow with a note “Delete the C L D‑108: Standards for Identifying Bad Behaviors of All Parties in the National Construction Market”. Buttons at the bottom read “Download selected”, “Re‑add to vectorstore”, and a green button “Delete from knowledge repository”, called out as “Delete out‑of‑date documents from the C L K R”.

The CLQA prototype and the CLKR update. Note: Codes and specifications for deploying the prototype are available in supplemental materials. Source(s): Authors’ own work

Figure 11

https://doi.org/10.48550/arXiv.2401.15378

The CLQA prototype and the CLKR update. Note: Codes and specifications for deploying the prototype are available in supplemental materials. Source(s): Authors’ own work

Table 1

Existing construction-related QA studies

No	References	QA targeted specific areas	QA-used models	Training-free	Knowledge scope for QA	QA performance test dataset
No	References	QA targeted specific areas	QA-used models	Training-free	Knowledge scope for QA	Question source	Number of questions
1	Chou et al. (2024)	Risk management in river dredging projects	A BERT-based deep learning model	×	Dredging risk knowledge collected by interviews	Listed by experienced dredging personnel	16
2	Kim et al. (2024)	Construction market knowledge in overseas projects	A BERT-based deep learning model	×	3 versions of a FIDIC standard contract written in English, Korean, and Indonesian	The FIDIC documents	80
3	Xue et al. (2024)	Building codes	A BERT-based deep learning model	×	2 Chapters of the IBC 2015	Manually generated for model testing	175
4	Lee et al. (2023)	Steel manufacturer equipment procurement	A machine learning model combining KG and QA	×	An equipment procurement document from a steel-making company	Generated questions based on relevant arbitration and clause settings	45
5	Tian et al. (2023)	Construction safety hazard	A BERT + BiGRU + Self-Attention-based deep learning model	×	6,325 safety hazard texts	Dedicated questions for model application	25
6	Wang and El-Gohary (2023)	Construction safety hazard	A CNN-based deep learning model	×	20 OSHA sections related to fall protection	Manually developed for model testing	671
7	Xu et al. (2023)	Coal mine construction safety	A BERT-BiLSTM-CRF-based deep learning model	×	43 sections of 80 papers from coal mine construction safety management standard specifications	Example questions used to validate the semantic query and entity information modules	Unspecified
8	Sun et al. (2020)	Construction document information transmission mining	A TF-IDF-based machine learning model	×	A monthly construction report containing 1734 words	Posed by three construction managers	5
9	Zhong et al. (2020)	Construction procedural constraint	A BiLSTM- + CRF-based deep learning model	×	14 types of national standards of CACQ in China	Sentences labeled by experts	400
10	Rajpurkar et al. (2016)	Multiple domains including building regulation domain	A logistic regression-based machine learning model	×	536 Wikipedia articles	Contributed by 5 civil engineers	Unspecified

No	References	QA targeted specific areas	QA-used models	Training-free	Knowledge scope for QA	QA performance test dataset
No	References	QA targeted specific areas	QA-used models	Training-free	Knowledge scope for QA	Question source	Number of questions
1	Chou et al. (2024)	Risk management in river dredging projects	A BERT-based deep learning model	×	Dredging risk knowledge collected by interviews	Listed by experienced dredging personnel	16
2	Kim et al. (2024)	Construction market knowledge in overseas projects	A BERT-based deep learning model	×	3 versions of a FIDIC standard contract written in English, Korean, and Indonesian	The FIDIC documents	80
3	Xue et al. (2024)	Building codes	A BERT-based deep learning model	×	2 Chapters of the IBC 2015	Manually generated for model testing	175
4	Lee et al. (2023)	Steel manufacturer equipment procurement	A machine learning model combining KG and QA	×	An equipment procurement document from a steel-making company	Generated questions based on relevant arbitration and clause settings	45
5	Tian et al. (2023)	Construction safety hazard	A BERT + BiGRU + Self-Attention-based deep learning model	×	6,325 safety hazard texts	Dedicated questions for model application	25
6	Wang and El-Gohary (2023)	Construction safety hazard	A CNN-based deep learning model	×	20 OSHA sections related to fall protection	Manually developed for model testing	671
7	Xu et al. (2023)	Coal mine construction safety	A BERT-BiLSTM-CRF-based deep learning model	×	43 sections of 80 papers from coal mine construction safety management standard specifications	Example questions used to validate the semantic query and entity information modules	Unspecified
8	Sun et al. (2020)	Construction document information transmission mining	A TF-IDF-based machine learning model	×	A monthly construction report containing 1734 words	Posed by three construction managers	5
9	Zhong et al. (2020)	Construction procedural constraint	A BiLSTM- + CRF-based deep learning model	×	14 types of national standards of CACQ in China	Sentences labeled by experts	400
10	Rajpurkar et al. (2016)	Multiple domains including building regulation domain	A logistic regression-based machine learning model	×	536 Wikipedia articles	Contributed by 5 civil engineers	Unspecified

Note(s): BERT: Bidirectional Encoder Representations from Transformers; BiGRU: Bidirectional Gated Recurrent Unit; BIM: Building Information Modeling; BiLSTM: Bidirectional Long Short-Term Memory; CACQ: Code for Acceptance of Construction Quality; CRF: Conditional Random Fields; FIDIC: International Federation of Consulting Engineers; IBC: International Building Code; IE: Information Extraction; KG: Knowledge Graph; NHC: National Hurricane Center; NLG: Natural Language Generation; NLP: Natural Language Process; NLU: Natural Language Understanding; OSHA: Occupational Safety and Health Organization; TF-IDF: Term Frequency-Inverse Document Frequency

Source(s): Authors’ own work

Table 2

Examples of recent studies on GPLLMs

No	References	GPLLMs	Specific domain	Test datasets
No	References	GPLLMs	Specific domain	Question sources	Question types	Number of questions
1	Alan et al. (2024)	GPT-3.5 turbo	Islam understanding	Designed by experts	Open-ended	3 (mentioned by the author)
2	Hou and Zhang (2024)	GPT-3.5 and GPT-4.0	Dietary supplement	Information on the MSKCC website	Closed-ended (MSQs and True/False)	2000
3	Mansurova et al. (2024)	Llama-2-7b and Llama-2-13b	General	TriviaQA open-domain dataset	Closed-ended (Filling in the blank)	500
4	Rasool et al. (2024)	GPT-3.5-turbo and GPT-4	Healthcare	CogTale dataset	Closed-ended (MMQs, MSQs, True/False, and number extraction)	337
5	Rizzo et al. (2024)	GPT-3.5 turbo and GPT-4	Orthopaedics	OITE in the 2020, 2021, and 2022	Closed-ended (MSQs)	207
6	Sahin et al. (2024)	GPT-4	Neurosurgery	The latest six written TNSPBE	Closed-ended (MSQs)	523
7	Schoch et al. (2024)	GPT-3.5 and GPT-4	Urology	A test book published by the FEBU association	Closed-ended (MSQs)	Around 600
8	Su et al. (2024)	GPT-4	Nursing	Taiwan’s 2022 Nursing Licensing Exam	Closed-ended (MSQs)	400
9	Tsoutsanis and Tsoutsanis (2024)	Llama-2, Google Bard, Bing Chat, and GPT-3.5	Clinical help	Commercial question banks (i.e. Qbank) for the MSRA exam	Closed-ended (MSQs)	100
10	Antaki et al. (2023)	GPT-3.5 turbo and GPT-4	Ophthalmology	Basic and Clinical Science Course Self-Assessment Program and an online question bank (i.e. OphthoQuestions)	Closed-ended (MSQs)	520
11	Choi et al. (2023)	ChatGPT	Laws	Exams for law school courses at the University of Minnesota	Closed-ended (MSQs) and open-ended (essay writing)	107
12	Gencer and Aydin (2023)	GPT-3.5 and GPT-4	Thoracic surgery	Turkish-language thoracic surgery exam questions	Closed-ended (MSQs)	105
13	Gilson et al. (2023)	InstructGPT, GPT-3.5, and ChatGPT	Medicine	A question bank for medical students and the NBME	Closed-ended (MSQs)	220
14	Oh et al. (2023)	GPT-3.5 and GPT-4	Surgery	The KGSBE in 2020, 2021, and 2022	Closed-ended (MSQs)	280
15	Pursnani et al. (2023)	GPT-3.5-Legacy, GPT-3.5-Turbo, and GPT-4	Engineering fundamental knowledge	An unpublished practice exam	Closed-ended (MSQs, MMQs, and filling in the blank)	134
16	Rosól et al. (2023)	GPT-3.5 and GPT-4	Medicine	3 versions of PMFE	Closed-ended (MSQs)	600
17	Saad et al. (2023)	GPT-4	Orthopedics	Mock FRCS Orth Part A	Closed-ended (MSQs)	240

No	References	GPLLMs	Specific domain	Test datasets
No	References	GPLLMs	Specific domain	Question sources	Question types	Number of questions
1	Alan et al. (2024)	GPT-3.5 turbo	Islam understanding	Designed by experts	Open-ended	3 (mentioned by the author)
2	Hou and Zhang (2024)	GPT-3.5 and GPT-4.0	Dietary supplement	Information on the MSKCC website	Closed-ended (MSQs and True/False)	2000
3	Mansurova et al. (2024)	Llama-2-7b and Llama-2-13b	General	TriviaQA open-domain dataset	Closed-ended (Filling in the blank)	500
4	Rasool et al. (2024)	GPT-3.5-turbo and GPT-4	Healthcare	CogTale dataset	Closed-ended (MMQs, MSQs, True/False, and number extraction)	337
5	Rizzo et al. (2024)	GPT-3.5 turbo and GPT-4	Orthopaedics	OITE in the 2020, 2021, and 2022	Closed-ended (MSQs)	207
6	Sahin et al. (2024)	GPT-4	Neurosurgery	The latest six written TNSPBE	Closed-ended (MSQs)	523
7	Schoch et al. (2024)	GPT-3.5 and GPT-4	Urology	A test book published by the FEBU association	Closed-ended (MSQs)	Around 600
8	Su et al. (2024)	GPT-4	Nursing	Taiwan’s 2022 Nursing Licensing Exam	Closed-ended (MSQs)	400
9	Tsoutsanis and Tsoutsanis (2024)	Llama-2, Google Bard, Bing Chat, and GPT-3.5	Clinical help	Commercial question banks (i.e. Qbank) for the MSRA exam	Closed-ended (MSQs)	100
10	Antaki et al. (2023)	GPT-3.5 turbo and GPT-4	Ophthalmology	Basic and Clinical Science Course Self-Assessment Program and an online question bank (i.e. OphthoQuestions)	Closed-ended (MSQs)	520
11	Choi et al. (2023)	ChatGPT	Laws	Exams for law school courses at the University of Minnesota	Closed-ended (MSQs) and open-ended (essay writing)	107
12	Gencer and Aydin (2023)	GPT-3.5 and GPT-4	Thoracic surgery	Turkish-language thoracic surgery exam questions	Closed-ended (MSQs)	105
13	Gilson et al. (2023)	InstructGPT, GPT-3.5, and ChatGPT	Medicine	A question bank for medical students and the NBME	Closed-ended (MSQs)	220
14	Oh et al. (2023)	GPT-3.5 and GPT-4	Surgery	The KGSBE in 2020, 2021, and 2022	Closed-ended (MSQs)	280
15	Pursnani et al. (2023)	GPT-3.5-Legacy, GPT-3.5-Turbo, and GPT-4	Engineering fundamental knowledge	An unpublished practice exam	Closed-ended (MSQs, MMQs, and filling in the blank)	134
16	Rosól et al. (2023)	GPT-3.5 and GPT-4	Medicine	3 versions of PMFE	Closed-ended (MSQs)	600
17	Saad et al. (2023)	GPT-4	Orthopedics	Mock FRCS Orth Part A	Closed-ended (MSQs)	240

Note(s): CogTale: Cognitive Treatments Article Library and Evaluation FEBU: Fellow of the European Board of Urology; FRCS Orth: Orthopedic fellow of the Royal College of Surgeons; iDISK: International Dietary Supplement Knowledgebase; KGSBE: Korean General Surgery Board Exams; MSKCC: Memorial Sloan Kettering Cancer Center; MSRA: Multi-Specialty Recruitment Assessment; NBME: National Board of Medical Examiners; OITE: Orthopedic In-Training Examination; PMFE: Polish Medical Final Examination; TNSPBE: Turkish Neurosurgical Society Proficiency Board Exams

Source(s): Authors’ own work

Table 3

GPLLMs selected for integration with CLKR

No	Contributors	GPLLMs	Parameters	Best in processing	Open-/Closed-source	Corresponding embedding models
1	Meta	Llama-2-70b	70 billion	English	Open-source	all-mpnet-base-v2
2	OpenAI	text-davinci-003	Unknown	English	Closed-source	text-embedding-ada-002
3	OpenAI	GPT-3.5 Turbo	20 billion	English	Closed-source	text-embedding-ada-002
4		GPT-4	1.8 trillion	English	Closed-source	text-embedding-ada-002
5	Tsinghua University	ChatGLM2-6B	6 billion	Chinese	Open-source	text2vec-large-chinese
6	Baidu	ERNIE-Bot-turbo	13 billion	Chinese	Closed-source	ERNIE Embedding-V1
7	Baidu	ERNIE-Bot 4.0	>1 trillion	Chinese	Closed-source	ERNIE Embedding-V1

No	Contributors	GPLLMs	Parameters	Best in processing	Open-/Closed-source	Corresponding embedding models
1	Meta	Llama-2-70b	70 billion	English	Open-source	all-mpnet-base-v2
2	OpenAI	text-davinci-003	Unknown	English	Closed-source	text-embedding-ada-002
3	OpenAI	GPT-3.5 Turbo	20 billion	English	Closed-source	text-embedding-ada-002
4		GPT-4	1.8 trillion	English	Closed-source	text-embedding-ada-002
5	Tsinghua University	ChatGLM2-6B	6 billion	Chinese	Open-source	text2vec-large-chinese
6	Baidu	ERNIE-Bot-turbo	13 billion	Chinese	Closed-source	ERNIE Embedding-V1
7	Baidu	ERNIE-Bot 4.0	>1 trillion	Chinese	Closed-source	ERNIE Embedding-V1

Source(s): Authors’ own work

Table 4

Wilcoxon T Tests on CLQA accuracy of 7 GPLLMs with and without CLKR in PCEQEs

No	GPLLM	CLKR	Average accuracy	Accuracy enhancement	z-statistic	p-value
1	Llama-2-70b	without	0.283	28.3%	4.197	0.000***
1	Llama-2-70b	with	0.363	28.3%	4.197	0.000***
2	text-davinci-003	without	0.329	44.9%	4.286	0.000***
2	text-davinci-003	with	0.476	44.9%	4.286	0.000***
3	GPT-3.5 Turbo	without	0.349	36.3%	4.287	0.000***
3	GPT-3.5 Turbo	with	0.476	36.3%	4.287	0.000***
4	GPT-4	without	0.528	25.4%	4.171	0.000***
4	GPT-4	with	0.663	25.4%	4.171	0.000***
5	ChatGLM2-6B	without	0.430	11.1%	3.729	0.000***
5	ChatGLM2-6B	with	0.478	11.1%	3.729	0.000***
6	ERNIE-Bot-turbo	without	0.419	10.2%	3.429	0.002***
6	ERNIE-Bot-turbo	with	0.462	10.2%	3.429	0.002***
7	ERNIE-Bot 4.0	without	0.755	9.9%	4.029	0.000***
7	ERNIE-Bot 4.0	with	0.830	9.9%	4.029	0.000***
Average accuracy of 7 GPLLMs		without	0.442	21.1%	NA	NA
Average accuracy of 7 GPLLMs		with	0.535	21.1%	NA	NA

No	GPLLM	CLKR	Average accuracy	Accuracy enhancement	z-statistic	p-value
1	Llama-2-70b	without	0.283	28.3%	4.197	0.000***
1	Llama-2-70b	with	0.363	28.3%	4.197	0.000***
2	text-davinci-003	without	0.329	44.9%	4.286	0.000***
2	text-davinci-003	with	0.476	44.9%	4.286	0.000***
3	GPT-3.5 Turbo	without	0.349	36.3%	4.287	0.000***
3	GPT-3.5 Turbo	with	0.476	36.3%	4.287	0.000***
4	GPT-4	without	0.528	25.4%	4.171	0.000***
4	GPT-4	with	0.663	25.4%	4.171	0.000***
5	ChatGLM2-6B	without	0.430	11.1%	3.729	0.000***
5	ChatGLM2-6B	with	0.478	11.1%	3.729	0.000***
6	ERNIE-Bot-turbo	without	0.419	10.2%	3.429	0.002***
6	ERNIE-Bot-turbo	with	0.462	10.2%	3.429	0.002***
7	ERNIE-Bot 4.0	without	0.755	9.9%	4.029	0.000***
7	ERNIE-Bot 4.0	with	0.830	9.9%	4.029	0.000***
Average accuracy of 7 GPLLMs		without	0.442	21.1%	NA	NA
Average accuracy of 7 GPLLMs		with	0.535	21.1%	NA	NA

Note(s): *** denote confidence levels above 99%

Source(s): Authors’ own work

Table 5

Wilcoxon T Tests on CLQA accuracy of GPLLMs with and without CLKR in MSQs and MMQs

Question type	CLKR	Average accuracy	Accuracy enhancement	z-statistic
MSQs	without	0.569	14.9%	9.451
MSQs	with	0.654	14.9%	9.451
MMQs	without	0.273	38.3%	9.360
MMQs	with	0.378	38.3%	9.360

Source(s): Authors’ own work

Table 6

Wilcoxon T Tests on CLQA accuracy of GPLLMs with and without CLKR across C1-C8

Knowledge domain	CLKR	Average accuracy	Accuracy enhancement	z-statistic
C1	without	0.466	14.5%	6.672
C1	with	0.534	14.5%	6.672
C2	without	0.449	28.2%	5.896
C2	with	0.576	28.2%	5.896
C3	without	0.449	21.6%	6.825
C3	with	0.546	21.6%	6.825
C4	without	0.408	21.1%	7.086
C4	with	0.494	21.1%	7.086
C5	without	0.458	24.5%	5.966
C5	with	0.571	24.5%	5.966
C6	without	0.465	23.6%	7.427
C6	with	0.575	23.6%	7.427
C7	without	0.462	21.6%	7.334
C7	with	0.562	21.6%	7.334
C8	without	0.470	17.9%	5.970
C8	with	0.555	17.9%	5.970

Knowledge domain	CLKR	Average accuracy	Accuracy enhancement	z-statistic
C1	without	0.466	14.5%	6.672
C1	with	0.534	14.5%	6.672
C2	without	0.449	28.2%	5.896
C2	with	0.576	28.2%	5.896
C3	without	0.449	21.6%	6.825
C3	with	0.546	21.6%	6.825
C4	without	0.408	21.1%	7.086
C4	with	0.494	21.1%	7.086
C5	without	0.458	24.5%	5.966
C5	with	0.571	24.5%	5.966
C6	without	0.465	23.6%	7.427
C6	with	0.575	23.6%	7.427
C7	without	0.462	21.6%	7.334
C7	with	0.562	21.6%	7.334
C8	without	0.470	17.9%	5.970
C8	with	0.555	17.9%	5.970

Source(s): Authors’ own work

Alan

A.Y.

Karaarslan

and

Aydin

(

2024

), “

A RAG-Based question answering system proposal for understanding Islam: MufassirQAS LLM

”,

arXiv

, pp.

, doi:

https://doi.org/10.1061/(asce)la.1943-4170.0000543

Alhyari

O.H.

and

Ani

A.R.A.

(

2022

), “

Is the engineering and construction contract legally less competitive than the red book in civil law countries?

”,

Journal of Legal Affairs and Dispute Resolution in Engineering and Construction

, Vol.

No.

, 06522001, doi:

https://doi.org/10.1016/j.xops.2023.100324

Antaki

Touma

Milad

El-Khoury

and

Duval

(

2023

), “

Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings

”,

Ophthalmology Science

, Vol.

No.

, 100324, doi:

https://doi.org/10.4103/njpt.njpt_47_23

Badyal

D.K.

Jain

Lata

and

Sharma

(

2023

), “

Triple Cs of scenario-based multiple-choice question: concept, construction, and corroboration

”,

National Journal of Pharmacology and Therapeutics

, Vol.

No.

, pp.

117

122

, doi:

https://doi.org/10.2139/ssrn.4335905

Choi

J.H.

Hickman

K.E.

Monahan

A.B.

and

Schwarcz

D.B.

(

2023

), “

ChatGPT goes to law school

”,

SSRN Electronic Journal

, doi:

https://doi.org/10.1016/j.engappai.2023.107744

Chou

J.-S.

Chong

P.-L.

and

Liu

C.-Y.

(

2024

), “

Deep learning-based chatbot by natural language processing for supportive risk management in river dredging projects

”,

Engineering Applications of Artificial Intelligence

, Vol.

131

, 107744, doi:

https://github.com/deepset-ai/haystack

deepset-ai

(

2024

), “

Haystack

”,

available at:

(

2023

), “

A statistical analysis of lawyer and grassroots legal service work in 2022

”,

available at:

https://www.moj.gov.cn/pub/sfbgw/zwxxgk/fdzdgknr/fdzdgknrtjxx/202306/t20230614_480740.html

Domengo

(

2024

), “

PythonProject

”,

available at:

https://github.com/Domengo/pythonProject/blob/master/llm-chat/langchain_gemini_qa.py

Eleliemy

and

Ciorba

F.M.

(

2021

), “

A distributed chunk calculation approach for self-scheduling of parallel applications on distributed-memory systems

”,

Journal of Computational Science

, Vol.

, 101284, doi:

https://doi.org/10.1016/j.jocs.2020.101284

https://doi.org/10.48550/arXiv.2301.13867

Frieder

Pinchetti

Chevalier

Griffiths

R.-R.

Salvatori

Lukasiewicz

Petersen

P.C.

and

Berner

(

2023

), “

Mathematical capabilities of chatgpt

”,

arXiv

, pp.

, doi:

https://doi.org/10.48550/arXiv.2312.10997

Gao

Xiong

Gao

Jia

Pan

Dai

Sun

Wang

and

Wang

(

2023

), “

Retrieval-augmented generation for large language models: a survey

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/j.amjms.2023.08.001

Gencer

and

Aydin

(

2023

), “

Can ChatGPT pass the thoracic surgery exam?

”,

The American Journal of the Medical Sciences

, Vol.

366

No.

, pp.

291

295

, doi:

https://doi.org/10.3390/buildings14010220

Ghimire

Kim

and

Acharya

(

2024

), “

Opportunities and challenges of generative AI in construction industry: focusing on adoption of text-based models

”,

Buildings

, Vol.

No.

, p.

220

, doi:

https://doi.org/10.2196/45312

Gilson

Safranek

C.W.

Huang

Socrates

Chi

Taylor

R.A.

and

Chartash

(

2023

), “

How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment

”,

JMIR Medical Education

, Vol.

No.

, e45312, doi:

https://www.thecanadianencyclopedia.ca/en/article/building-codes-and-regulations (

Hansen

(

2013

), “Building Codes and Regulations”, In

The Canadian Encyclopedia

available at:

accessed

16 December 2013).

Harvel

Haiek

F.B.

Ankolekar

and

Brunner

D.J.

(

2024

), “

Can LLMs answer investment banking questions? Using domain-tuned functions to improve LLM performance on knowledge-intensive analytical tasks

”,

Proceedings of the AAAI Symposium Series

, Vol.

No.

, pp.

125

133

, doi:

https://doi.org/10.1609/aaaiss.v3i1.31191

https://doi.org/10.1101/2024.09.11.24313513

Hou

and

Zhang

(

2024

), “

Enhancing dietary supplement question answer via retrieval-augmented generation (RAG) with LLM

”,

medRxiv

, pp.

, doi:

https://doi.org/10.1177/1096348020985212

Hwang

and

Mattila

A.S.

(

2022

), “

The effect of smart shopper self-perceptions on word-of-mouth behaviors in the loyalty reward program context

”,

Journal of Hospitality & Tourism Research

, Vol.

No.

, pp.

243

266

, doi:

https://github.com/infiniflow/ragflow

infiniflow

(

2024

), “

Ragflow

”,

available at:

Khademi Adel

Modir

and

Ravanshadnia

(

2022

), “

An analytical review of construction law research

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

1931

1945

, doi:

https://doi.org/10.1108/ecam-05-2020-0306

https://www.cnet.com/tech/services-and-software/microsoft-copilot-embraces-the-power-of-openais-new-gpt-4-o/

Khan

(

2024

), “

Microsoft’s copilot embraces the power of openAI’s new GPT-4o

”,

available at:

https://doi.org/10.1061/jcemd4.coeng-14273

Kim

Chung

and

Chi

(

2024

), “

Cross-lingual information retrieval from multilingual construction documents using pretrained language models

”,

Journal of Construction Engineering and Management

, Vol.

150

No.

, 04024041, doi:

https://doi.org/10.1103/physrevphyseducres.19.010132

Kortemeyer

(

2023

), “

Could an artificial-intelligence agent pass an introductory physics course?

”,

Physical Review Physics Education Research

, Vol.

No.

, 010132, doi:

https://github.com/langchain-ai/langchain

Langchain-ai

(

2024

), “

Langchain

”,

available at:

Lee

S.-H.

Choi

S.-W.

and

Lee

E.-B.

(

2023

), “

A question-answering model based on knowledge graphs for the general provisions of equipment purchase orders for steel plants maintenance

”,

Electronics

, Vol.

No.

, p.

2504

, doi:

https://doi.org/10.3390/electronics12112504

and

Zeng

(

2021

Construction Engineering Regulations

Nanjing University Press

Nanjing

https://doi.org/10.1016/j.ijproman.2010.01.012

Liu

J.Y.

and

Low

S.P.

(

2011

), “

Work–family conflicts experienced by project managers in the Chinese construction industry

”,

International Journal of Project Management

, Vol.

No.

, pp.

117

128

, doi:

https://doi.org/10.3390/ijerph18137040

Liu

Mao

and

Yuan

(

2021

), “

Risk perception and coping behavior of construction workers on occupational health risks—A case study of Nanjing, China

”,

International Journal of Environmental Research and Public Health

, Vol.

No.

, p.

7040

, doi:

https://doi.org/10.1016/j.ject.2024.08.007

Liu

Meng

and

Yang

(

2024

), “

LLM technologies and information search

”,

Journal of Economy and Technology

, Vol.

, pp.

269

277

, doi:

https://doi.org/10.1016/j.enbenv.2024.03.010

Tian

Zhang

Zhao

Zhang

Feng

Wang

and

(

2024

In press

), “

Evaluation of large language models (LLMs) on the mastery of knowledge and skills in the heating, ventilation and air conditioning (HVAC) industry

”,

Energy and Built Environment

, doi:

https://doi.org/10.3390/bdcc8090115

Mansurova

and

Nugumanova

(

2024

), “

QA-RAG: exploring LLM reliance on external knowledge

”,

Big Data and Cognitive Computing

, Vol.

No.

, p.

115

, doi:

https://doi.org/10.1016/j.cosrev.2023.100552

Martinez-Gil

(

2023

), “

A survey on legal question–answering systems

”,

Computer Science Review

, Vol.

, 100552, doi:

https://huggingface.co/docs/transformers/main/model_doc/llama2

meta-llama

(

2024

), “

Llama2

”,

available at:

neuml

(

2024

), “

Txtai

”,

available at:

https://github.com/neuml/txtai

NPC

(

2024

), “

National legal database

”,

available at:

https://flk.npc.gov.cn/index.html

Oeding

J.F.

A.Z.

Mazzucco

M.C.

Taylor

S.A.

Dines

D.M.

Warren

R.F.

Gulotta

L.V.

Dines

J.S.

and

Kunze

K.N.

(

2024

), “

ChatGPT-4 performs clinical information retrieval tasks using consistently more trustworthy resources than does Google search for queries concerning the latarjet procedure

”,

Arthroscopy: The Journal of Arthroscopic & Related Surgery

, Vol.

No.

, pp.

588

597

, doi:

https://doi.org/10.1016/j.arthro.2024.05.025

https://doi.org/10.4174/astr.2023.104.5.269

Choi

G.-S.

and

Lee

W.Y.

(

2023

), “

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

”,

Annals of Surgical Treatment and Research

, Vol.

104

No.

, p.

269

, doi:

https://github.com/pathwaycom/llm-app

pathwaycom

(

2024

), “

llm-app

”,

available at:

Pei

Liu

Chen

and

Zhang

(

2024

), “

Application of large language models and assessment of their ship-handling theory knowledge and skills for connected maritime autonomous surface ships

”,

Mathematics

, Vol.

No.

, p.

2381

, doi:

https://doi.org/10.3390/math12152381

https://sebastian-petrus.medium.com/top-10-rag-frameworks-github-repos-2024-12b2a81f4a49

Petrus

(

2024

), “

Top 10 RAG frameworks Github Repos 2024

”,

available at:

https://doi.org/10.1016/j.caeai.2023.100183

Pursnani

Sermet

Kurt

and

Demir

(

2023

), “

Performance of ChatGPT on the US fundamentals of engineering exam: comprehensive assessment of proficiency and potential implications for professional environmental engineering practice

”,

Computers and Education: Artificial Intelligence

, Vol.

, 100183, doi:

https://doi.org/10.48550/arXiv.1606.05250

Rajpurkar

Zhang

Lopyrev

and

Liang

(

2016

), “

Squad: 100,000+ questions for machine comprehension of text

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/j.nlp.2024.100083

Rasool

Kurniawan

Balugo

Barnett

Vasa

Chesser

Hampstead

B.M.

Belleville

Mouzakis

and

Bahar-Fuchs

(

2024

), “

Evaluating LLMs on document-based QA: exact answer selection and numerical extraction using CogTale dataset

”,

Natural Language Processing Journal

, Vol.

, 100083, doi:

https://doi.org/10.1016/j.jor.2023.11.056

Rizzo

M.G.

Cai

and

Constantinescu

(

2024

), “

The performance of ChatGPT on orthopaedic in-service training exams: a comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education

”,

Journal of Orthopaedics

, Vol.

, pp.

, doi:

https://doi.org/10.1038/s41598-023-46995-z

Rosól

Gąsior

J.S.

Łaba

Korzeniewski

and

Młyńczak

(

2023

), “

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish medical final examination

”,

Scientific Reports

, Vol.

No.

, 20512, doi:

https://github.com/RUC-NLPIR/FlashRAG

RUC-NLPIR

(

2024

), “

FlashRAG

”,

available at:

Saad

Iyengar

K.P.

Kurisunkal

and

Botchu

(

2023

), “

Assessing chatgpt's ability to pass the FRCS orthopaedic part A exam: a critical analysis

”,

The Surgeon

, Vol.

No.

, pp.

263

266

, doi:

https://doi.org/10.1016/j.surge.2023.07.001

https://doi.org/10.1016/j.compbiomed.2023.107807

Sahin

M.C.

Sozer

Kuzucu

Turkmen

Sahin

M.B.

Sozer

Tufek

O.Y.

Nernekli

Emmez

and

Celtikci

(

2024

), “

Beyond human in neurosurgical exams: chatgpt's success in the Turkish neurosurgical society proficiency board exams

”,

Computers in Biology and Medicine

, Vol.

169

, 107807, doi:

https://doi.org/10.1016/j.dibe.2023.100300

Saka

Taiwo

Saka

Salami

B.A.

Ajayi

Akande

and

Kazemi

(

2024

), “

GPT models in construction industry: opportunities, limitations, and a use case validation

”,

Developments in the Built Environment

, Vol.

, 100300, doi:

https://doi.org/10.48550/arXiv.2311.07994

Sasazawa

Yokote

Imaichi

and

Sogawa

(

2023

), “

Text retrieval with multi-stage Re-Ranking models

”,

arXiv

, pp.

, doi:

https://doi.org/10.1016/s0302-2838(24)00759-0

Schoch

Schmelz

H.U.

Borgmann

and

Nestler

(

2024

), “

A0179 - performance of ChatGPT on the fellow of the European Board of Urology (FEBU) exams: a comparative analysis

”,

European Urology

, Vol.

, pp.

S923

S924

, doi:

https://doi.org/10.1016/j.jbi.2023.104285

Shi

Wang

Ren

ValizadehAslani

Zhang

and

Liang

(

2023

), “

Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment

”,

Journal of Biomedical Informatics

, Vol.

138

, 104285, doi:

https://www.chinacourt.org/article/detail/2023/03/id/7178559.shtml

SPC

(

2023

), “

The total number of cases accepted by courts nationwide exceeded 33 million in 2022

”,

available at:

M.-C.

Lin

L.-E.

Lin

L.-H.

and

Chen

Y.-C.

(

2024

), “

Assessing question characteristic influences on ChatGPT’s performance and response-explanation consistency: insights from Taiwan’s nursing licensing exam

”,

International Journal of Nursing Studies

, Vol.

153

, 104717, doi:

https://doi.org/10.1016/j.ijnurstu.2024.104717

https://doi.org/10.1016/j.autcon.2019.103048

Sun

Lei

Cao

Zhong

Wei

and

Yang

(

2020

), “

Text visualization for construction document information management

”,

Automation in Construction

, Vol.

111

, 103048, doi:

https://doi.org/10.3390/buildings12050531

Tao

Feng

and

Zhang

(

2022

), “

Reducing construction dust pollution by planning construction site layout

”,

Buildings

, Vol.

No.

, p.

531

, doi:

https://doi.org/10.1016/j.autcon.2022.104670

Tian

Ren

Zhang

Han

and

Shen

(

2023

), “

Intelligent question answering method for construction safety hazard knowledge based on deep semantic mining

”,

Automation in Construction

, Vol.

145

, 104670, doi:

https://github.com/truefoundry/cognita

truefoundry

(

2024

), “

Cognita

”,

available at:

Tsoutsanis

and

Tsoutsanis

(

2024

), “

Evaluation of large language model performance on the multi-specialty recruitment assessment (MSRA) exam

”,

Computers in Biology and Medicine

, Vol.

168

, 107794, doi:

https://doi.org/10.1016/j.compbiomed.2023.107794

https://doi.org/10.1016/j.autcon.2022.104696

Wang

and

El-Gohary

(

2023

), “

Deep learning-based relation extraction and knowledge graph-based representation of construction safety requirements

”,

Automation in Construction

, Vol.

147

, 104696, doi:

https://doi.org/10.1016/j.neucom.2024.127308

Wang

Xiong

Liu

Zhang

and

(

2024

), “

A self-adaptive ensemble for user interest drift learning

”,

Neurocomputing

, Vol.

577

, 127308, doi:

https://doi.org/10.1108/ecam-05-2023-0512

Liang

Guo

Meng

Zhou

and

Zhang

(

2023

), “

Entity recognition in the field of coal mine construction safety based on a pre-training language model

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

2590

2613

, doi:

https://doi.org/10.1016/j.autcon.2024.105730

Xue

Zhang

and

Chen

(

2024

), “

Question-answering framework for building codes using fine-tuned and distilled pre-trained transformer models

”,

Automation in Construction

, Vol.

168

, 105730, doi:

https://doi.org/10.48550/arXiv.2401.06782

Liu

Lin

Zhao

and

Che

(

2024

), “

Semantic similarity matching for patent documents using ensemble BERT-Related model and novel text processing method

”,

arXiv

, pp.

, doi:

https://doi.org/10.1080/10803548.2022.2115657

Zailani

Moda

Ibrahim

and

Abubakar

(

2023

), “

Improving the antecedents of non-compliance to safety regulations toward an optimized self-regulated construction environment in Nigeria

”,

International Journal of Occupational Safety and Ergonomics

, Vol.

No.

, pp.

1212

1219

, doi:

Zheng

Chiang

W.-L.

Sheng

Zhuang

Lin

and

Xing

(

2023

), “

Judging llm-as-a-judge with mt-bench and chatbot arena

”,

Advances in Neural Information Processing Systems

, Vol.

, pp.

46595

46623

https://doi.org/10.1016/j.aei.2019.101003

Zhong

Xing

Luo

Zhou

Rose

and

Fang

(

2020

), “

Deep learning-based extraction of construction procedural constraints from construction regulations

”,

Advanced Engineering Informatics

, Vol.

, 101003, doi:

https://doi.org/10.1108/ecam-02-2023-0172

Zhou

Gao

Tang

and

Wang

(

2023

), “

Intelligent detection on construction project contract missing clauses based on deep learning and NLP

”,

Engineering Construction and Architectural Management

, Vol.

No.

, pp.

1546

1580

, doi:

https://doi.org/10.1007/978-3-031-53308-2_8

Zhu

Wang

Okumura

and

Yang

(

2024

), “

MRHF: multi-stage retrieval and hierarchical fusion for textbook question answering

”,

International Conference on Multimedia Modeling

, pp.

111

, doi: