This study aims to systematically investigate how different prompt engineering strategies influence the performance of ChatGPT in answering undergraduate-level civil engineering examination questions. Specifically, it seeks to identify whether alternative prompting techniques can improve model performance in cases where initial responses are entirely incorrect and to examine how these effects vary across courses, question types and repeated model executions.
A total of 295 examination questions from 25 undergraduate civil engineering courses were posed to ChatGPT-4o using an original zero-shot chain-of-thought prompt. Forty-nine questions that yielded completely incorrect responses were subsequently re-evaluated using role-based, self-consistency and tree-of-thought prompting strategies, each applied three times. All responses were assessed using a rubric-based scoring system supported by expert judgement, enabling comparative analysis across courses, question formats and prompt types.
The findings demonstrate that prompt engineering can improve ChatGPT's performance, although the effect is highly context dependent. Among the evaluated strategies, tree-of-thought prompting produced the highest average performance improvement, followed by role-based and self-consistency prompts. However, no performance change was observed in over half of the analysed questions, particularly those requiring visual interpretation, graphical representation or spatial reasoning, indicating persistent limitations of the model in visually intensive and spatially demanding civil engineering tasks.
This study provides a comprehensive domain-specific evaluations of prompt engineering in civil engineering education by combining authentic examination data, rubric-based expert assessment, repeated prompt applications and analysis of initially incorrect responses. Unlike prior studies focusing primarily on overall accuracy rates, it offers a nuanced analysis of failure cases, question formats and stability across repeated prompts, contributing original empirical evidence on both the potential and limitations of LLMs in engineering education.
