Quotation error refers to the inconsistency between cited information and its original source. This phenomenon leads to a series of negative impacts, such as misinterpreting of original research, undermining the academic community’s collective understanding of relevant issues and weakening the accuracy and fairness of the citation-based academic evaluation system. Existing studies have shown that quotation error is prevalent in the academic community; moreover, manual verification of quotation error is not only labor-intensive but also inefficient. Therefore, this paper aims to propose the task of “automated detection of quotation errors.”
Adopting a large language model (LLM)-based approach, this paper improves detection performance from two aspects on the basis of existing research: first, using the fine-tuning approach for LLMs to detect quotation errors; second, incorporating full-text data of the cited literature into data set construction and third, exploring the optimal scheme for building such data sets by comparing three types of full-text integration methods. Based on this, the paper further uses the TokenSHAP tool to conduct interpretability experimental analysis on the model’s prediction results.
The fine-tuning approach for LLMs has improved the performance in detecting quotation errors. Among the different methods for incorporating full-text information, the approach based on using the source abstract yielded the best performance.
The fine-tuning approach for LLMs is applied to the task of automated detection of quotation errors, and interpretability analysis is conducted on the model’s output results.
