The purpose of this study is to investigate the accuracy and creativeness of ChatGPT in the domain of quantitative aptitude.
ChatGPT 3.5 is used to generate multiple-choice quantitative aptitude questions. A total of 1,100 questions were created across 11 different areas of quantitative aptitude. A dataset is obtained through ChatGPT prompts. Human specialists assessed the accuracy and creativity of these questions. Every question is evaluated and classified into six distinct grades to indicate its level of accuracy. Likewise, the procedure of assessing each question includes providing a grade that showcases originality. Subsequently, we generate hypotheses to evaluate the accuracy and creativity of ChatGPT’s response. The hypotheses are evaluated through the application of statistical methods, such as the one-tailed test.
Our study indicates that ChatGPT exhibits a moderate degree of accuracy when solving mathematical aptitude questions. Our work shows that, for instance, when prompted to generate 10 questions regarding a specific quantitative aptitude topic, ChatGPT is unlikely to produce more than five questions that are accurate in terms of solution and explanation, and it seldom generates more than three new questions. This study also compares the accuracy of ChatGPT in answering questions related to quantitative aptitude with that of questions related to medical science. This study illustrates that ChatGPT is less precise in its responses to quantitative aptitude questions than it is in medical science questions. However, including it as a tool for producing a wide range of quantitative aptitude questions poses a significant problem in terms of creativeness.
The study is focused on a topic set that encompasses approximately 50% of the topics studied within the realm of quantitative aptitudes. In addition, the inclusion of human experience in verifying the correctness of ChatGPT may potentially undermine the study’s accuracy.
Our study shows that ChatGPT demonstrates poor originality and quantitative correctness, thereby limiting its teaching value. This is particularly worrying for students, as ChatGPT does not assist in assessing an answer, making human verification necessary.
Our research will be valuable for individuals residing in countries such as India who are actively preparing for competitive examinations to secure employment in diverse government and private enterprises and are utilising the ChatGPT platform for this purpose.
