Skip to Main Content
Purpose

Artificial intelligence (AI) has the potential to radically transform the accountancy profession, starting with routine transactions and functions and moving towards more strategic, leadership and governance responsibilities. The purpose of this paper is to evaluate the extent to which various Large Language Models (LLMs) can replicate the performance of accounting students completing a professional exit-level examination. By directly comparing the outputs of multiple AI models to those of students pursuing the Chartered Accountant (South Africa) [CA(SA)] designation, the study assesses whether AI has the capacity to support or potentially replace, accounting practitioners. In doing so, it clarifies both the opportunities and limitations of AI for the profession and for accounting education.

Design/methodology/approach

Exit-level exam questions taken by students pursuing the CA(SA) designation were given to ChatGPT, Claude, CoPilot, Grok and Gemini. A zero-shot prompt method was adopted. Each model was given the scenario and required tasks to complete, with mark allocations, as real-world students were. Basic descriptive statistics and visual representations were used to analyse the data.

Findings

The results find that AI models are not all equal as far as dealing with financial accounting questions. AI models’ primary strength is in dealing with journal entries and basic calculation questions. They struggle to critically evaluate partial and complete solutions to identify errors and correct them. These tasks require judgement and the ability to distinguish fact from fiction. This is where humans significantly outperform AI and should be where the accountancy profession focuses its student education. Of the AI models, Gemini performed the best overall but did not outperform the student average.

Practical implications

The study provides a useful baseline for future studies to monitor the progress of AI models in completing technical accounting exams and providing support to accounting professionals.

Originality/value

The study contributes to the literature by focusing on a renowned global chartered accountant designation from a developing economy and comparing different AI models’ performance to that of exit-level university students. The study also provides objective evidence that speaks to different AI models’ ability to support accounting practitioners in real-world settings or their ability to replace accounting consultants.

Artificial intelligence (AI), in the form of large language models (LLMs), is deceptively and “remarkably human-like” (Roberts et al., 2024) with the ability to adapt to its environment (Staszkiewicz et al., 2024). This makes it easy to forget that we are interacting with machines and software [1]. Part of the problem is that we project meaning onto LLMs’ output text, which we interpret as reflecting someone’s feelings, experiences, knowledge, desires and intent. But LLMs are not beings that engage with us to exchange information and meaning (Roberts et al., 2024). LLMs are probabilistic language systems that dispassionately combine words to create the illusion of conversation based on the probability of those words being in similar text in their training data (see Gendron et al., 2024).

Despite the fallacy of human-likeness imagined while interacting with LLMs, many people, including accountants, are thrilled at the prospect of AI’s possibilities, with mixed feelings about the potential threats (see Roberts et al., 2024; Gendron et al., 2024). What is clear is that AI and LLMs are constantly improving with no clear limit in sight, and the distinct possibility of currently unknown emergent capabilities (Roberts et al., 2024). Historically, technological innovations have had a limited impact on jobs that require high levels of education and intelligence. Instead, manual labour and routine tasks were most susceptible to replacement by new technologies. Now, AI and LLMs have challenged this notion because of their ability to provide the illusion of human conversation and intelligence whilst also analysing unprecedented volumes of textual data to yield astonishing results (Dong et al., 2024). For instance, AI models have identified potential new antibiotics to treat bacterial infections after going through millions of compounds (see Roberts et al., 2024).

In a similar vein, AI has the potential to radically transform the accountancy profession by not simply taking over routine transactions and functions, but by providing accounting advice and consulting services that require judgment to deal with complex transactions and events (Leitner-Hanetseder et al., 2021). Accordingly, the profession needs to remain vigilant for both the risks posed and opportunities presented by AI (Eisikovits et al., 2025). Students need to be prepared for the jobs that will be required of them. At the same time, policies, guidelines and regulations need to be developed to manage AI’s use where trust, accuracy and privacy are key pillars. The public will also need to be educated about how to use accounting advice provided by AI responsibly.

At present, much of the accounting AI research has focused on integrating data analytics, blockchain and emerging technologies into accounting curricula; understanding how AI can be leveraged in audits; assisting with sustainability data management and reporting systems and unpacking the risks posed by AI (Tharapos, 2022; De Villiers, 2021; Lodhia et al., 2025; Roberts et al., 2024; Gendron et al., 2024; Arise and Moloi, 2025; Boateng and Boateng, 2025). Roberts et al. (2024), in particular, warn about the risk of deskilling because of young accountants not learning necessary skills or experienced accountants losing critical skills as tasks are outsourced to AI. More work is required to understand this balancing act and to prepare accounting information systems for new technologies (Imjai et al., 2025).

Some researchers have already begun evaluating AI’s accounting capabilities by asking LLMs to complete CPA, CMA, CIA and EA exams (Wood et al., 2023; Eulerich et al., 2024; Katz, 2024), accounting and auditing questions in North Macedonia (Atanasovski et al., 2023) and Portuguese Chartered Accounting exams (Pinto et al., 2024). However, most of these studies focus on the USA context and study only one LLM model. This paper takes a step further by analysing five different LLMs’ (ChatGPT, Claude, CoPilot, Grok and Gemini) ability to provide accounting advice and consulting services in comparison to final year Chartered Accountant (South Africa) [CA(SA)] students. The central aim of this paper is to assess the current capabilities of AI models to solve accounting problems where judgment and application are required by benchmarking their performance against exit-level students pursuing the CA(SA) designation. In doing so, the study aims to provide evidence of whether AI, in its present form available to the public, is capable of supporting or replacing professional accountants. It also identifies key implications for practice and education.

The scope of this research deliberately focuses on examining how contemporary AI models (refer to Section 2.3) perform in the highly technical and judgment-intensive context of professional accounting, as proxied by performance in exit-level university accounting examinations. While AI has been applied across a range of business functions, from predictive analytics to process automation (Akerkar, 2019; Enholm et al., 2022; Ribeiro et al., 2021; Zong and Guan, 2025), this study narrows its focus to the potential of AI to support or replace accounting practitioners. On the one hand, this framing is important because accounting as a profession is subject to regulations, standards and has high public interest. On the other hand, the profession demands high degrees of professional scepticism, judgement and ethical reasoning (South African Institute of Chartered Accountants [SAICA] International Federation of Accountants Institute of Chartered Accountants in England and Wales). Accordingly, advances in AI have different implications for the accountancy profession compared to more routine business applications. By situating the study within this context, the research directly addresses how developments in AI intersect with professional education, practitioner roles, and the future trajectory of the accounting profession.

The paper makes several important and timely contributions. Firstly, the study provides a baseline with which future iterations of LLMs can be compared to objectively assess their progress. Secondly, the paper highlights different LLMs’ advantages and weaknesses to assist or replace accountants in theoretical, numerical and integrated accounting queries. Thirdly, by comparing LLMs’ marks to those of real exit-level students at a SAICA[2]-accredited university, the paper provides a sense of how close LLMs are to imitating the abilities of newly qualified trainee CA(SA)s.

Finally, the study offers accounting practitioners with an evidence-based assessment of where AI tools currently excel and where they fall short. This reiterates that while AI can support efficiency in routine accounting functions (for example, preparing journal entries, basic reconciliations and simple calculations), it lacks the higher-order judgment and integrated thinking logic required in tasks, such as interpreting incomplete data sets, critically evaluating complex problems or applying nuanced professional scepticism. Practitioners should view AI as an assistive technology rather than a substitute. Its greatest potential lies in freeing up human capacity to focus on more complex tasks, such as advisory and assurance roles. The findings also carry important implications for accounting education. By illustrating that students outperform AI in judgment-based questions, the study underscores the need for curricula to emphasise critical reasoning, integrated thinking, ethical judgment and problem-solving. Embedding AI literacy into professional education programmes will also become essential, not only to prepare graduates to use these tools responsibly but to help them understand the limitations of AI-generated outputs. On conspectus, these insights reinforce the dual responsibility of educators and practitioners to shape future accountants and the profession to leverage AI effectively while safeguarding the distinctive human competencies that underpin professional trust and accountability (Handoyo, 2024; Ballantine et al., 2024; Holmes and Douglass, 2022).

The remainder of this paper is structured as follows. Section 2 provides an overview of the prior literature that investigates AI in an accounting context. Section 3 details the methodology, while the results and conclusions are presented in Sections 4 and Section 5, respectively.

To begin, a bibliometric-style literature review was carried out using the Scopus Database (methodology as per Ecim and Maroun, 2023). This database was selected because of the quality of its filtering criteria and the fact that it includes journals of good standing with robust peer-review processes in place (Dumay et al., 2016; Rinaldi et al., 2018).

A search was performed for academic articles featuring terms related to the use of AI in technical aspects of accounting practice [3] in their titles, keywords or abstracts. The subjects were filtered and limited to incorporate AI for accounting purposes in the following areas: business, finance, accounting, assurance, economics, risk, governance, ethics, policy, sustainability, capitals, strategy and management. The start date for the search was 1986, being the earliest date available for research published on this topic. All papers from 1 January 1986 to 1 June 2025 were considered, yielding 90 documents. This illustrates that a relatively limited body of academic research has been conducted to date [4].

Each document was assessed to confirm its relevance for the current study (Dumay et al., 2016). The authors read the title, abstract and contents of each document (as per Dumay et al., 2016). The papers were screened to ensure that they did not deal only with AI and accounting in general. Rather, the papers should examine how these tools can be leveraged to respond to and address various aspects of accountancy and financial reporting resulting from the increased adoption of AI tools.

As shown in Figure 1, research pre-dating 2018 was staggered, with only a few relevant papers. After 2018, research was consistently being produced annually.

Figure 1.
A line chart shows the number of publications by year from 1986 to 2025, with very low counts until 2018, a sharp rise after 2021, a peak in 2024, and a decline in 2025.The line chart plots the number of publications on the vertical axis against years from 1986 to 2025 on the horizontal axis. The series starts at 1 in 1986, rises to 2 in 1995, then remains at 1 in 1996, 2005, and 2007. It increases to 2 in 2011 and returns to 1 in 2015 and 2018. A clear rise begins in 2019 with 5 publications, followed by 4 in 2020 and 5 in 2021. The count increases to 8 in 2022 and 14 in 2023, peaks at 32 in 2024, then decreases to 12 in 2025.

Number of academic sources focusing on AI and accounting

Source: Authors’ own work

Figure 1.
A line chart shows the number of publications by year from 1986 to 2025, with very low counts until 2018, a sharp rise after 2021, a peak in 2024, and a decline in 2025.The line chart plots the number of publications on the vertical axis against years from 1986 to 2025 on the horizontal axis. The series starts at 1 in 1986, rises to 2 in 1995, then remains at 1 in 1996, 2005, and 2007. It increases to 2 in 2011 and returns to 1 in 2015 and 2018. A clear rise begins in 2019 with 5 publications, followed by 4 in 2020 and 5 in 2021. The count increases to 8 in 2022 and 14 in 2023, peaks at 32 in 2024, then decreases to 12 in 2025.

Number of academic sources focusing on AI and accounting

Source: Authors’ own work

Close modal

A seminal 1986 paper noted that assurance and financial accounting needed to adopt methods being developed in the disciplines of human information processing and AI. However, these changes in subject matter and technology would require significant investments in research and dramatic changes to the accounting curriculum (Elliott, 1986). In 1995, two papers addressed the need for integrating AI into accounting curricula, citing the increased use of technology in accounting fields such as assurance, financial and management accounting, taxation and by the government (Baldwin-Morgan, 1995; White, 1995). Nevertheless, accounting students received little to no exposure to AI during their studies, with tertiary institutions encouraged to broadly adopt AI into the curriculum by way of innovative teaching methods to prepare students for technology-driven workplaces (Baldwin-Morgan, 1995).

Thirty years later, the drive to incorporate more technology and AI into the curriculum continues but with more attention as AI becomes commonplace in people’s daily lives (Abdo-Salloum and Al-Mousawi, 2025). It is axiomatic that accounting and the related assurance need to integrate technology as part of a more holistic approach to risk management, strategic thinking, operations and performance evaluation (Maroun et al., 2023). Yet the fact that researchers are still debating the usefulness and the incorporation of AI into accounting curricula points to a continued gap in the implementation and use of AI. Put differently, in the evolving landscape of AI, there needs to be a similar growth in the adoption and use of AI at a technical level and to assess whether the AI is capable of producing the correct outcomes (Handoyo, 2024).

A more recent body of research focuses on evaluating the ability of AI to respond to technical accounting questions and comparing these responses to student results or model answers (Albuquerque and Gomes Dos Santos, 2024; Atanasovski et al., 2023; Wood et al., 2023). In particular, studies have noted how, post the COVID-19 pandemic, there is a distinct shift from traditional educational paradigms to a technology-oriented approach (Al Shehab, 2022; Myeza et al., 2023; Handoyo, 2024). This perspective examines both tertiary institutions that integrate AI into their curricula and students who leverage AI as part of their studies and assessments. However, the AI assessments are often limited to focusing on a single tool, most commonly ChatGPT (Wood et al., 2023; Albuquerque and Gomes Dos Santos, 2024; Atanasovski et al., 2023), rather than assessing a broader spectrum of AI-related tools.

The notable increase in research from 2023 can be attributed to three factors. Firstly, the COVID-19 pandemic reiterated the importance of incorporating technology into an organisation’s investment decisions, operations, strategy and performance evaluation (Adams and Abhayawansa, 2022). This filters down to the technical level where students enter the workforce. Secondly, various international accounting, assurance and corporate governance frameworks have incorporated how organisations ought to deal with and integrate technology into their business models, emphasising digital reporting and data standardisation, especially in so far as sustainability reporting is concerned (ISSB, 2021; European Commission, 2024; Maroun, 2022). Thirdly, there was a widespread increase in public awareness of AI and LLMs following the launch of ChatGPT (GPT – 3.5) by OpenAI in November 2022 (Assidi et al., 2025; Eisikovits et al., 2025; Wood et al., 2023; Atanasovski et al., 2023). This launch was followed by early experimentation with AI in various contexts, including accountancy, and subsequent research on the topic.

Figure 2 illustrates that accountancy-related AI research is dominated by USA authors (26% of publications). Un-tabulated results show that research focuses predominantly on developed economies (67%) rather than developing economies (33%). This implies that there is a primarily developed economy research perspective on AI and accounting. Organisations operating in developed territories will often have more sophisticated technology and accounting and management infrastructure to collect, analyse and report on data used for financial reporting purposes. Similarly, the corresponding tertiary education institutions and their students have more funding to develop, implement and use AI in and out of the classroom. Although China (7%) also features prominently, its economic environment more closely aligns with the features of developed economies.

Figure 2.
A horizontal bar chart compares the number of publications by country, with the United States leading by a wide margin.The horizontal bar chart presents the number of publications on the horizontal axis and countries on the vertical axis. The United States records the highest value with 23 publications. China and Australia each have 6 publications. The United Arab Emirates and Indonesia each show 5 publications. The United Kingdom, Spain, and Bahrain each have 4 publications. Portugal, Lebanon, Iraq, and Germany each record 3 publications. The chart shows a strong concentration of publications in the United States compared with other countries.

Research per jurisdiction

Source: Authors’ own work

Figure 2.
A horizontal bar chart compares the number of publications by country, with the United States leading by a wide margin.The horizontal bar chart presents the number of publications on the horizontal axis and countries on the vertical axis. The United States records the highest value with 23 publications. China and Australia each have 6 publications. The United Arab Emirates and Indonesia each show 5 publications. The United Kingdom, Spain, and Bahrain each have 4 publications. Portugal, Lebanon, Iraq, and Germany each record 3 publications. The chart shows a strong concentration of publications in the United States compared with other countries.

Research per jurisdiction

Source: Authors’ own work

Close modal

The low contribution of research from developing economies is concerning because AI is intended to help organisations and tertiary institutions tackle pressing social and environmental challenges that are often significant in developing economies.

Using a bibliometric analysis [5] and VosViewer software (Van Eck and Waltman, 2017), the core academic papers were consolidated to visualise the main themes covered by the sources under review (Zupic and Čater, 2015). The size of each node indicates its prominence in prior research. Distances between the nodes capture the interconnections among them, with shorter distances indicating interconnected topics/themes/keywords (Bellucci et al., 2020; Caputo et al., 2021). Refer to Figure 3.

Figure 3.
A network map shows artificial intelligence and accounting education as central themes linked to skills, automation, chatgpt, auditing, and higher education topics.The network map presents keyword relationships centred on artificial intelligence and accounting education, shown as the largest connected nodes. These core topics link to accounting, auditing, accounting information systems, sustainable development, and technology acceptance. Skill related terms cluster around automation, critical thinking, communication, adaptability, and artificial intelligence literacy. Education focused links include higher education, assessment, performance, business students, expert systems, chat bots, chatgpt, and academic dishonesty. Additional connected themes include data analytics, blockchain technology, accounting curriculum, machine learning, deep learning, and professional accounting, indicating an interdisciplinary research landscape.

Network visualisation of responsible investment components

Source: Authors’ own work

Figure 3.
A network map shows artificial intelligence and accounting education as central themes linked to skills, automation, chatgpt, auditing, and higher education topics.The network map presents keyword relationships centred on artificial intelligence and accounting education, shown as the largest connected nodes. These core topics link to accounting, auditing, accounting information systems, sustainable development, and technology acceptance. Skill related terms cluster around automation, critical thinking, communication, adaptability, and artificial intelligence literacy. Education focused links include higher education, assessment, performance, business students, expert systems, chat bots, chatgpt, and academic dishonesty. Additional connected themes include data analytics, blockchain technology, accounting curriculum, machine learning, deep learning, and professional accounting, indicating an interdisciplinary research landscape.

Network visualisation of responsible investment components

Source: Authors’ own work

Close modal

The components have been interpretively labelled and analysed by the researchers using their judgement. Refer to Table 1 for more details.

Table 1.

Cluster analysis

ComponentCluster colourKey conceptsRelevance to practitionersRelevance to studentsExample of literature
Digital infrastructure and AI readinessYellowThis cluster links AI adoption to broader accounting information systems, sustainability and education. It considers whether systems are adaptive enough to integrate AI and whether users are ready to accept itEnsure that the organisation’s systems support AI useUnderstand systems thinking and audit trails in AI-driven contextsAssidi et al. (2025); Anomah et al. (2024) 
Data analytics and curriculum integrationRedThere is an evolving need to integrate emerging technologies into accounting curricula. The research in this cluster emphasises the importance of upskilling future accountants in tools and methods necessary to engage with AI-driven data environmentsHiring graduates who can handle complex data tasksMust be exposed to analytics and digital technologies early at tertiary institutions to remain relevantCoyne et al. (2016); Polimeni and Burke (2021) 
AI technologies and professional adaptationOrangeThis cluster illustrates how core AI technologies, such as machine learning and automation, are reshaping accounting tasks. Repetitive tasks are being handled by AI while accountants and auditors focus on judgment-based roles requiring human monitoring and decision-makingAdapt workflows to incorporate AI tools while ensuring accountabilityDevelop skills in critical thinking and adaptability to complement automationArise and Moloi (2025); Eisikovits et al. (2025) 
Student engagement and learning transformationGreenThis cluster focuses on how AI tools, most commonly ChatGPT, are affecting student performance, assessment methods and academic integrity. It underscores the transformation in education delivery and post-implementation reviews of accounting programsEngage with academia to guide practical AI applicationsLearn how to ethically and effectively use AI as a support tool rather than a shortcutBaldwin-Morgan (1995); Katz (2024), Wood et al., including multiple crowdsourced authors, 2023
Evolving competencies for the “AI-Accountant”BlueThe focus of this cluster is on the underlying skill sets required to operationalise AI such as, communication, AI literacy and critical thinking. These complement technical knowledge to form a more holistic accounting professional who applies integrated thinkingAssess hiring and training priorities, policies and evaluationsTransition to an integrated approach to technical and professional competenciesMcguigan et al. (2021); Imjai et al. (2025) 
AI tools in teaching and learning environmentsPurpleThis cluster highlights the growing use of tools such as ChatGPT and chatbots in accounting education. It explores the benefits and challenges of using generative AI to assist in the learning processBe aware of how such tools shape the skills of new graduatesMaking use AI for personalised learning but importantly, evaluating the outputsBaldwin-Morgan (1995); Handoyo (2024); Pinto et al. (2024) 
Educator roles in AI transformationBrownThis smaller cluster shows the pivotal role educators play in shaping accounting graduates’ preparedness for AI-integrated workplaces. It also assesses the perceived quality and integrity of accounting outputs from AICollaborate with educators to ensure alignment of learning outcomes with industry needsRely on educators to frame AI in accounting not just as a tool, but as part of continued professional developmentHolmes and Douglass (2022) 
Source(s): Authors’ own work

While prior studies have begun to explore the integration of AI into financial reporting, auditing, and management accounting, there are some gaps that remain, which the current study begins to address. To illustrate, Kokina and Davenport (2017) and Appelbaum et al. (2021) highlight AI’s potential to automate routine accounting and assurance tasks but stop short of empirically testing performance in real-world settings. Similarly, other studies consider AI applications in auditing, but largely from a conceptual or exploratory standpoint rather than through systematic evaluation (for example, Issa et al., 2016; Lu, 2019). Existing studies also tend to focus on single models or hypothetical use cases (Albuquerque and Gomes Dos Santos, 2024; Eulerich et al., 2024), leaving unanswered questions about the relative performance and use cases of different LLMs. There is also a limited body of research that has benchmarked AI tools against professional standards of competence, such as exit-level accounting examinations, which serve as rigorous and standardised tests of both technical proficiency and professional judgement (Pinto et al., 2024). This study addresses these gaps by directly comparing the outputs of multiple LLMs with those of exit-level accounting students, offering empirical evidence of AI’s current strengths and weaknesses. In doing so, it contributes to a more grounded and empirical understanding of the extent to which AI can support or replace professional accountants, while also clarifying the implications for accounting education and training.

The next section outlines how various AI tools can be utilised as part of a technical accounting assessment.

When discussing AI, it is essential to distinguish between automation and generative AI. The former does not incorporate an element of creativity, unpredictability or aim to imitate a conversation with another human (see Roberts et al., 2024; Holmes and Douglass, 2022). Automations will accurately and faithfully carry out the tasks assigned to them in line with their programming (Elnakeeb and Elawadly, 2025). Good programming leads to consistent and reliable results, a vital trait in the accountancy profession. AI, on the other hand, explicitly incorporates elements of creativity, innovation and unpredictability (Holmes and Douglass, 2022).

These abilities offer the potential to think “out of the box”, leading to new and novel solutions at unprecedented rates. AI may offer non-obvious solutions to age-old problems or incorporate significant amounts and types of data to provide solutions that surpass a human’s capabilities. However, creativity also poses the biggest challenge to its adoption by the accounting profession (Eisikovits et al., 2025; Elnakeeb and Elawadly, 2025). Enabling a system to be creative and find novel solutions also introduces the ability for it to be wrong – sometimes referred to as hallucinations (Emsley, 2023).

LLMs cannot distinguish between truth and falsehood and lack the capacity to care about truth criteria. Accordingly, the data they are fed during training and the user ratings of previous outputs can have significant implications for LLMs’ accuracy (Roberts et al., 2024; Gendron et al., 2024). Dahl et al. (2024) suggest that as much as 58% of ChatGPT 4’s textual output could be a hallucination. In addition to hallucinations, Moreover, if the public wanted to rely on LLMs for accounting advice, the sycophancy tendencies mean users with incorrect accounting opinions may have those views reinforced as the model seeks to please its “master” (see also Roberts et al., 2024).

Humans also make mistakes; the adage “to err is human” comes to mind. However, humans typically are conscious of their limitations and use tentative language to communicate their uncertainty with language such as “it could be”, “I think that” and “it might” (see Roberts et al., 2024). LLMs do not. Worse, LLMs’ output reflects confident and well-articulated responses that do not inform users about potential inaccuracies or uncertainties. This lack of transparency poses a significant risk, making it currently inappropriate to rely solely on AI-generated responses. Nonetheless, as with all tools, AI can serve as a valuable aid provided it is used with an awareness of its limitations and appropriate controls to address them.

The accountancy profession undertakes a wide variety of tasks from routine bookkeeping, financial monitoring and management to complex strategy setting and deal negotiations. There are also those whose primary function is providing accounting advice and solutions to client organisations. Each function requires slightly different skills and competencies. For example, routine and repetitive financial accounting tasks require no “creativity”, as there is an objective and correct response that is consistently required. If AI models use their creativity in these scenarios, it may result in inappropriate outputs. For accounting advice, strategy, and deal negotiations, AI’s creativity may be useful in providing possible solutions, but these will still require significant human scrutiny before being adopted, especially from an ethical perspective, where biases and blatant hallucinations are key risks (Roberts et al., 2024). To gauge the current ability of AI to assist accounting professionals, its ability to respond to technical questions must be assessed as a starting point and monitored closely as it evolves.

For the purposes of this research, five AI tools were utilised, namely ChatGPT, Claude, Copilot, Grok and Gemini. Prior research has predominantly focused on ChatGPT, overlooking the various methods employed in developing other models that may yield different outcomes. This is particularly relevant in the accounting context because different LLMs vary considerably in their underlying mechanics, training data and performance in different technical tasks. This affects their validity, accuracy and completeness for professional use. While some models, such as ChatGPT, have gained widespread adoption among accounting professionals, other emerging tools (such as Claude, Gemini, CoPilot and Grok) are increasingly accessible and may offer distinct strengths for specific accounting tasks. Consequently, research is required to specifically evaluate a variety of different AI models to determine whether their performance is equal in all tasks or which tasks each is most adequately equipped to support (Assidi et al., 2025).

By systematically comparing multiple models, this study provides practitioners and educators with objective evidence about which AI tools are most capable of supporting a range of routine and judgement-based accounting tasks. Accordingly, this paper provides an important contribution by exploring the differences in the suite of LLM tools available to accounting professionals. Each is briefly discussed below in the context of the accounting profession.

2.3.1 ChatGPT (OpenAI) [6].

ChatGPT boasts a versatile functionality that may make it useful in, for example, summarising financial reports, analysing and applying International Financial Reporting Standards (IFRS) and other accounting standards and automating repetitive tasks such as interpreting data and formulas. It can also analyse uploaded documents, making it ideal for reviewing financial statements or analysing accounting transactions. It has a broad general adoption, but may be subject to “hallucinations”, being verbose and misunderstanding calculation-based questions (Zhao and Wang, 2024).

2.3.2 Claude (anthropic) [7].

Claude may be particularly useful for accountants dealing with long and complex documents such as annual reports, assurance statements or Environment, Social and Governance (ESG) disclosures. The tool may be useful for summarising and extracting insights from regulatory filings or corporate reports. Claude incorporates a human and professional tone and may be appropriate for conducting accounting-related analysis and drafting responses. The tool may, however, be overly cautious when preparing responses and require multiple prompts (Zacher and Kuppannagari, 2024).

2.3.3 Copilot (Microsoft/GitHub) [8].

Copilot may be useful for accountants who also engage in programming, data analytics or process automation. This LLM is integrated into coding environments such as Excel VBA, Python and SQL. The tool can assist with report generation, developing accounting models and streamlining data cleaning. Copilot boosts efficiency in finance functions by adopting low-code or custom technology solutions for internal controls, dashboards or audit analytics (Vasarhelyi et al., 2023).

2.3.4 Grok (xAI) [9].

Grok, developed by xAI and integrated with X (formerly Twitter), offers accounting professionals quick, real-time insights on financial trends, economic indicators and regulatory updates as shared on social media. While still early in development compared to peers, Grok’s strength lies in contextualising fast-moving financial discussions and enabling real-time monitoring of market sentiment and corporate news. It is also able to analyse and interpret questions.

2.3.5 Gemini (Google DeepMind) [10].

Gemini can assist accounting professionals by automating data analysis and generating insights from complex financial data sets. It can also assist in drafting reports, summarising regulatory updates and responding to client queries using natural language processing. Additionally, Gemini can support audit and assurance work by identifying anomalies, generating documentation and enhancing decision-making through predictive analytics.

The context of the CA(SA) route is discussed next to provide background of how these tools can respond to technical accounting questions in this area.

The accounting profession in South Africa is well-established and globally respected, particularly the CA(SA) designation, which is regulated by SAICA. The CA(SA) designation is recognised as the Number 1 chartered accountancy designation in the world [12]. Other key regulatory bodies include the Independent Regulatory Board for Auditors (IRBA), which governs Registered Auditors who sign off on audited financial statements. SAICA holds mutual recognition agreements with professional bodies in, for example, the UK, Canada and Australia.

SAICA prescribes stringent educational and training requirements, including accredited university programmes, a competency-based curriculum, professional examinations and a structured training contract. The financial accounting assesses students’ ability to apply International Financial Reporting Standards (ensuring this paper’s results are relevant to a global audience). To qualify as a CA(SA), students must complete a formal tertiary education, mandatory on-the-job training and pass two professional board exams. This begins with obtaining a SAICA-accredited undergraduate degree, followed by a postgraduate qualification. The level of difficulty and integration increases as students progress in the degree. Thereafter, candidates enter a three-year training contract (also known as articles) with a SAICA-accredited training office. These offices are typically audit firms, banks, or specialised consulting agencies. Candidates receive extensive theoretical and practical experience in accounting, auditing, taxation and financial management. There are two board examinations that must be completed after university. The first focuses on technical competencies and proficiencies, while the second is designed to evaluate not only technical knowledge but also professional judgement, ethics and the ability to integrate diverse areas of accounting and business practice. Upon successful completion of all requirements, candidates are eligible to register as CAs(SA) and become members of SAICA.

South Africa has historically maintained high professional and educational standards, with its CA(SA) designation being internationally recognised and widely respected. At the same time, the country’s profession faces contextual challenges and opportunities. Although South Africa boasts the highest literacy rate in Africa, there are infrastructure and socio-economic barriers to high-quality, dynamic education that cannot be ignored (Patel and Ragolane, 2024). The CA(SA) designation attracts students from a number of backgrounds, across a range of geographical locations where numerous universities endorsed by SAICA have unequal access to educational resources, modern infrastructure and highly proficient and motivated staff members. All of these impact the potential to integrate AI training and advancement meaningfully into the accounting curricula.

Regulatory frameworks increasingly emphasise accountability, assurance and integrated thinking, reflecting South Africa’s pioneering role in corporate governance practices by way of King IV (IODSA, 2016). In addition, the country is increasingly recognised as an advocate of strong integrated reporting and thinking practices (Ecim, 2024). This creates a heightened need for accountants who can apply both technical competence and broader professional judgement and critical thinking. Nevertheless, the fact that South Africa is a developing economy presents challenges from a financial, technological and resource perspective. Despite the South African accounting sector’s adoption of automation tools, particularly in accounting and assurance, this is still in its early or emerging stages of use. Adoption has been shaped by factors such as data privacy concerns, cost barriers and uneven access to technological infrastructure across firms (Patel and Ragolane, 2024).

As a result, using South Africa as a case study jurisdiction allows for the benchmarking of AI performance against the CA(SA) examination, providing a meaningful and internationally relevant test of competence. The exam reflects global technical standards and an emphasis on ethics and integrated thinking. Secondly, by focusing on a developing economy, this study adds to the global literature on AI in accounting, which has been dominated by research from developed economies. This provides a more inclusive understanding of how AI adoption may be operationalised in diverse professional and educational contexts, where even small- and medium-sized listed enterprises may struggle to deal with the complexities of accounting (see Brookes et al., 2025).

Exit-level exam questions taken by students pursuing the CA(SA) designation were given to ChatGPT, Claude, CoPilot, Grok and Gemini. These exam questions test university-leaving students’ technical competence over a wide variety of topics, levels of complexity and integration that assess a student’s readiness for practical settings (see Section 2.4). Accordingly, these represent realistic assessments of AI models’ ability to support and/or replace chartered accountants in practice.

Five questions testing different types of queries were selected from the 2023 and 2024 4th-year financial accounting exams for exit-level students. This allowed a comparison of the AI models’ marks to the average mark of 4th-year CA(SA) students [13]. Fourth-year university-leaving students were selected for the study as they have received 4 years of technical training, reaching the peak of their technical competence on the road to becoming a CA(SA). Although these potential CA(SA)s will gain greater practical skills and experience during the training process, they will not be in a better position to answer technical questions than they are when completing their final year of study. The selected questions tested journal entries; theory discussion to apply accounting to a specific transaction; critically evaluating a given accounting treatment; calculations to prepare financial statements; and calculations to prepare financial statements where a partial solution was already provided.

The five questions were specifically selected to reflect the breadth of competencies that exit-level CA(SA) candidates are expected to demonstrate. The questions also assess a wide spectrum of accounting tasks from routine transactions to critical thinking. These demonstrate AI’s ability to handle various types of scenarios. Journal entry preparation tests the ability to translate transactions into the language of accounting, a foundational skill for any professional accountant. The theory-based discussion question assesses conceptual understanding and the application of accounting standards to novel or transaction-specific scenarios, mirroring the interpretive demands of professional practice. The critical evaluation question was chosen because it requires higher-order judgement, professional scepticism and the ability to identify and correct errors. These are skills that are critical to both assurance and advisory roles. The two calculation-based financial statement questions capture the technical and numerical proficiency required of accountants, with one question deliberately structured around a partial solution to test the ability to work with incomplete or potentially flawed information. Taken together, these five questions represent a balanced and comprehensive sample of the technical, interpretive and judgemental skills that define professional competence in financial accounting. They provide a robust basis for comparing AI models to human students in terms of readiness to perform as professional accountants.

The AI models were presented with scenarios and required tasks to complete, along with mark allocations, one at a time. Each model was asked to “prepare a full solution to the exam question provided”. This ensured that AI’s ability to address each component of technical accounting could be assessed individually. Additionally, the paid versions [14] of the AI models were used as these represent the respective models’ best capabilities available to the public. It also avoids the limitations imposed on free versions of the models.

A zero-shot prompt method was used to effectively simulate the capabilities of off-the-shelf AI models in providing accounting support to accountants or clients, as is currently available to the public. Accordingly, no course notes, prior examples or additional domain-specific training were provided as these would not be readily available to people seeking accounting support. People seeking accounting support would not necessarily have the expertise to review model outputs and train them to provide improved solutions. Additionally, individuals in practice who aim to use AI to replace an accounting practitioner or consultant may not possess the necessary expertise to craft highly refined prompts or provide detailed training to the AI. Consequently, evaluating LLMs under these conditions provides insights into their potential to support or replace professional accountants and consultants in their current form. Using a zero-shot approach also ensures a consistent baseline across all models, allowing for a fair comparative assessment of their inherent capabilities. It also ensures that differences in performance are attributable to the models themselves rather than variations in prior training or input material. Training AI on discipline-specific material and crafting specific prompts for different types of accounting tasks are significant and distinct research objectives, which differ from the current study. As such, they are noted as limitations and highlighted as areas for future research in Section 5.

The solutions were given to a co-author [15] who had marked the students’ scripts for the respective questions. This adds consistency in the marking, enhancing the study’s credibility when comparing the AI models’ and students’ marks. To ensure validity and reliability, each model’s solutions were then independently marked by an experienced CA(SA) with over 10 years’ teaching experience at the university level, and who regularly marks SAICA’s professional board exams. This marker was not informed of the research objectives nor given details that the attempt was from AI models to ensure that the marker was objective when evaluating the attempts. The respective marks were compared and no material differences were noted. Basic descriptive statistics and visual representations were used to analyse the data.

The findings reveal that AI performance across different models is not uniform. Each model’s strengths and weaknesses are evident when examined by question type. Accordingly, the findings begin with an overall discussion and are then structured by exam question type, enabling a focused discussion of the models’ capabilities and limitations.

None of the AI models could outperform the student average overall on a total basis, but Gemini was closest, scoring 3 percentage points lower. While Gemini performed best, it also provided the most unnecessarily voluminous solutions. For example, it provided 10 pages of text for an 11-mark journal entry question. This may be because of the use of this platform to generate reports. CoPilot performed worst, scoring only 17%, 37 percentage points lower than the student average. In all questions, save that to critique a given accounting treatment, at least one AI model outperformed the student average (refer to Table 2).

Table 2.

Results

Type of questionAvailable marksChatGPT (%)Claude (%)CoPilot (%)Grok (%)Gemini (%)Student average (%)Student pass rate (%)
Journal entries12411004186756380
Calculations for financial statement preparation213355060385266
Calculations for financial statement preparation with partial solution given321901919634753
Theory application2542422650605260
Critique the given accounting treatment1860614176581
Total108273117405154a 
Deviation from the student average on a total basis −27−22−37−14−3  
Note(s):

Italic figures represent which model and the student average scored best for each type of question. Underlined figures represent the type of question in which each model and the student average performed best. aA final mark of 50% or more is required to pass at the authors’ home institution

In terms of journal entries, all models, except CoPilot, outperformed the student average. However, all models failed the question requiring a critique of a given accounting treatment by a significant margin (see Table 2). Claude earned no marks for two questions while CoPilot scored nil for one question. For both Claude and CoPilot, one instance was due to an inability to correctly read the journal entry information given in the scenario. This was strange and without any clear reason as it was not that both models struggled to read the same question. In Claude’s second nil score, it was because it provided the process that should be followed and key points to consider as opposed to providing a completed solution. Although this may aid accounting practitioners, the guidance provided was generic and of little more use than guidance that can be found with a simple Google search. The responses provided by Claude are indicative of its cautious nature, where additional prompts may be required to gain a complete solution (Zacher and Kuppannagari, 2024).

Overall, the AI models were inconsistent in their performance across different types of questions; how well they comprehended the scenarios given and limiting their responses to only what was asked of them. In some instances, poignant points remained undiscussed. This was particularly the case for the discussion question posed to the AI models. Their answers centred around information necessary to answer the question without discussing why certain information should not have been considered. This emphasises the need to use AI as a tool by accountants and other users, rather than to replace the work currently being performed by CA(SA)’s (Handoyo, 2024). Additionally, there were instances of repetition and the inclusion of superfluous guidance that was not directly related to the specific context of the questions posed. The inclusion of, at times, unnecessary detail is likely to be confusing or frustrating to users looking to optimise their workload. These results support Roberts et al. (2024) who emphasise that LLMs are not people conversing with us. They are probabilistic algorithms that are indifferently spitting out text that is likely to be relevant based on their training data.

The results also indicate significant differences in the ability of various AI models to complete financial accounting exam questions. AI models are certainly capable and even outperform humans in some types of questions, but none have come close to humans’ average ability to critique accounting treatments. It may be because, as Gendron et al. (2024) and Roberts et al. (2024) highlight, LLMs cannot distinguish fact from fiction. Accordingly, when given information of which some is correct and some is incorrect, it struggles. This is an important and useful finding as it provides clear evidence of where AI is ill-suited to supporting financial accountants.

Overall, the comparative results reflect that while certain AI models can match or surpass students on specific tasks, their aggregate performance remains below that of humans. The substantial variation observed across models suggests that no single platform currently offers reliable end-to-end capabilities across core financial accounting topics. The results emphasise the need for careful human supervision when integrating AI into accounting workflows.

The findings also demonstrate that current AI models’ strengths are primarily procedural in nature, while their weaknesses emerge when interpretation, judgement, selectivity and professional scepticism are required. Because AI outputs must be scrutinised and iteratively engaged by an accounting expert to obtain valid, accurate and complete solutions, users without financial accounting proficiency would not be able to rely on AI platforms as a direct substitute for professional accounting expertise.

Each type of question posed to the LLMs requires slightly different skills. Journal entries primarily rely on calculations and an understanding of accounting’s debit-credit system. Claude, Grok and Gemini scored well on journal entries, all outperforming the student average. Interestingly, Claude achieved 100%, which indicates this is a particular strength. This may be due to its conservatism and programming orientation (see Section 2.3). It also suggests that when provided with accurate information and a task to complete from scratch, AI can perform effectively. Although this was the case, it should be noted that in some cases, Grok and Glaude provided more than one alternative to answer the question. As is the case when marking student attempts, the first answer was marked. The choice provided by the platforms indicates the essential role of humans and that AI remains a tool and not a replacement for accounting advice and consulting services.

ChatGPT and Copilot performed the worst by a large margin on the journal entry questions. The major contributor to this poor performance was the incorrect application of the principles outlined in the IFRS governing the assessed transaction. The LLMs appear to have hallucinated IFRS principles, yet, as discussed by Roberts et al. (2024), answered in a confident and convincing manner with no hint of uncertainty (see also Zhao and Wang, 2024). Where the AI platforms excelled was in the use of the correct account names in the journals. This is an area where many students struggle. Often, calculations are prepared by students without an understanding of the relevant accounts affected, resulting in incomplete solutions. Consequently, this is an example of where AI can support accountants in practice.

In summary, the journal entry results reveal that when tasks are well-defined with clear syntactic rules, several AI models can outperform the student average. This finding is expected. Nevertheless, inconsistencies in applying IFRS principles and occasional hallucinations mean that accuracy cannot be assumed, even in seemingly straightforward tasks. Accordingly, their high performance on journal entry questions should not be misconstrued as reflecting accounting understanding and reasoning comparable to that of humans (see Gendron et al., 2024; Roberts et al., 2024). These findings point to the continued importance of professional scepticism and human supervision in financial accounting practice.

Straight calculation questions (other than journal entries) focus on financial and simple mathematics, coupled with the measurement and presentation requirements of different accounting standards. Most marks are awarded for calculations, with a few allocated to correctly using calculated answers in the presentation of financial statements. Unfortunately, CoPilot could not correctly read this question, despite having no problems with the other questions. All other models encountered no problems reading the question. This highlights that we still do not fully understand how AI models process information and that they are susceptible to unknown errors, which can be frustrating and worrying.

Although the average student mark was only 52%, 66% of the students in the class passed the question on calculating figures and presenting them on the face of financial statements. Claude and Grok performed well, scoring 55% and 60%, respectively, outperforming the student average. This emphasises the need for professional and technical training to focus on critical thinking and judgement, areas of great importance to the CA(SA) professional that remain underdeveloped in AI models (Du Toit et al., 2024b; Du Toit et al., 2024a; Jackling and De Lange, 2014). While Claude and Grok performed well in the calculation questions, the other models struggled, scoring in the 30 s. This question represents a common task that CA(SA)s are required to complete. Common issues identified during the marking process included a lack of adjustments made to numbers provided in the trial balance for narrative discussions; calculations made were not carried through to the requested financial statement; the inclusion of foreign elements in the statements; and the inclusion of amounts in the statements that were not substantiated with workings or did not agree with the workings provided (more evidence of hallucinations). This was especially apparent in ChatGPT’s solution, where amounts were included in the statement with minimal calculations or discussion to support provided figures. A crucial role of the CA(SA) is the interpretation and analysis of financial information. Providing a solution to a client without supporting calculations, reasons or rationale would not meet the expectations of clients requesting accounting advice or consulting services.

The next question also required calculations and presenting financial statements but provided students with some calculations already performed (partial solution). This seemed to confuse the models, with only Gemini passing with 63% while all other models failed by large margins. Students also struggled most with this question, with an average mark of 47% but an overall pass rate of 53%, indicating that the student population diverged in their ability to address this question.

Major concerns with AI models’ attempts were the use of incorrect formats for the statements (CoPilot); a lack of workings to support numbers provided in the statement (ChatGPT and CoPilot); and issues with the AI models reading all the information in the various formats provided (including amounts in tables and explanatory notes provided in narrative form). Linking and referencing numerical workings to a final answer is a problem that is not unique to AI models; many students lost marks for not including workings in the final answer or performing workings without indicating what they relate to. However, the issue is more pronounced in an AI environment.

Overall, the calculation-based questions reflect the rapid decline in AI models’ performance when shifting from mechanical tasks to the integration of multiple topics that require a holistic conceptual understanding of IFRS. While a few models handled numerical tasks competently, others produced unsupported figures or misinterpreted scenario details, reinforcing the need for expert human involvement. These patterns indicate that AI may assist with routine tasks that require surface-level pattern recognition but remain unreliable in tasks requiring integrated thinking and reasoning across multiple data points.

Theory questions assess candidates’ ability to integrate and apply accounting principles as opposed to merely recalling rules and patterns. Accordingly, they are particularly effective at distinguishing between candidates with a comprehensive conceptual understanding of accounting principles from those with mechanical competence based on rote learning or calculation-driven approaches to scoring marks.

The student average was 52% with 60% of the class passing the question. In comparison, only Grok and Gemini passed out of all AI models. Gemini outperformed the student average by 8 percentage points but included significant extraneous detail in comparison to other AI models and students. Gemini’s primary mistake was the same as the students’. It overlooked the need to account for a financing component. The financing component was not explicitly stated in the scenario nor highlighted in the task requirement. Consequently, identifying the financing component and accounting for it correctly required a comprehensive understanding of the relevant IFRS principles coupled with a critical analysis of the information provided.

Both Grok and Gemini were able to apply most of the information from the scenario to the relevant IFRS requirements and score the core marks available. This type of scenario-specific application was notably lacking in the solutions provided by the other AI platforms, illuminating more strengths and weaknesses of the respective models.

The results suggest that only some models can connect scenario information to the relevant IFRS concepts. However, even the stronger AI performers occasionally overlooked implicit elements of the scenario, demonstrating the current limits of their probabilistic text generation to replicate humans’ understanding and professional reasoning. Further prompts and training of the models can improve the outcomes. But users with limited financial accounting knowledge would struggle to identify errors in AI models or to guide them appropriately. Consequently, in its current form, AI cannot substitute for accounting professionals.

The results indicate that AI can complement professional judgement and decision-making when it is used as a type of cognitive sounding board. Doing so facilitates reflective evaluation of judgement-based decisions under conditions of uncertainty. This iterative process can highlight deficiencies in an accountant’s reasoning, inconsistent application of principles and misalignment with prevailing industry practices because of the depth and breadth of information available to AI. Using AI as a complementary tool in this way can improve accountants’ decision-making and facilitate improved financial reporting. An additional benefit is that using AI as a sounding board helps prevent the evaluation-related anxiety accountants experience when voicing their opinions on the application of IFRS principles to complex scenarios.

The final question required a critique of a given accounting treatment. The student average was highest for this question (65%) with a pass rate of 81%. However, this was the most poorly answered question across all LLMs. Not one model was able to score over 17%. This suggests that AI struggles to differentiate between what given information is correct and incorrect. This is a challenging yet vitally important competency for accountants, particularly in light of the global accounting and auditing scandals that have occurred over the past decade. For the profession, this competency should be nurtured in students, both because it is necessary and because it appears to be where AI is currently ill-equipped to outperform humans.

In addition to issues with identifying errors, the AI platforms appeared to struggle with the limitations imposed on the questions. The question instructions specified the financial year for which a solution was required. Three of the AI platforms criticised the accounting treatment relating to the incorrect financial year. Another issue with answering in line with the scope of the question was the inclusion of discussions of correct accounting treatments. This was beyond the scope of what the question asked. When students answer in a similar fashion, it often indicates a lack of understanding of what is required or that the student is applying a “shotgun” approach. The latter involves presenting as much information as possible in the hope that, by virtue of the volume of information, some of it is correct.

The critique question exposes significant limitations of current AI models, including their inability to distinguish correct from incorrect treatments and remain within the defined scope of the task. AI tools are not yet equipped to perform critical assessment tasks central to professional accounting practice. Unfortunately, this is when AI assistance would be most useful, yet AI is currently at its weakest. This represents a crucial boundary for the responsible use of AI in the accounting profession.

In summary, ChatGPT did not pass any individual question and failed overall with only 27%. It struggled across the board and appeared ill-suited to providing financial accounting support and advice. It was very inconsistent, sometimes presenting the correct conclusions but with incorrect reasons and calculations. It also did not seem to comprehend what the exam required of it while providing too much or too little information. Claude looks promising and achieved 100% for journal entries but nil for two other questions. Had those questions scored at least some part marks, it would likely have at least passed the assessment overall.

CoPilot, save where it scored nil, was always outperformed by another AI model, with an overall score of 17%. More research is required to understand its substantial underperformance relative to other models. Grok’s favourable performance was surprising. It is not known for fact-checking stories and comments before allowing them to go public (see, for example, Vosoughi et al., 2018). As a result, using X (formerly Twitter) as training data was expected to negatively impact Grok’s performance. Nevertheless, Grok came in second place amongst the AI models with an overall mark of 40%. Whilst still a failure mark, it scored the highest mark for the calculation and presentation question, even outperforming the student average. Grok passed three of the five questions.

Gemini scored the highest mark for two questions and was the only model to pass overall (51%). The primary concern with its responses was their volume, which always seemed excessive relative to the complexity of the task, mark allocation and what both students and other AI models provided. At present, the use of multiple AI tools as a “completeness check” may be best for users seeking financial accounting assistance and advice. This leverages the strongest aspect of each AI tool as part of a balanced approach to optimising day-to-day activities. Over time, users will also gauge which models are best suited to their specific tasks and preferences.

LLMs currently appear unable to replace accounting consultants who provide advice and related services. However, LLMs are valuable tools that can assist accounting practitioners with their duties (see Roberts et al., 2024; Gendron et al., 2024). Their primary strength lies in handling journal entries, basic calculations, and presentation queries that follow surface-level patterns, are syntactic and formulaic. AI models’ performance declines rapidly once professional judgement, selectivity and scepticism are required. AI models struggle to critically evaluate partial and complete solutions, identifying errors and state how to correct them. This is especially true when questions are integrated or there is a need to interpret information from the question, what is required of them and the relevant IFRS requirements simultaneously. These areas require judgment and the ability to distinguish fact from fiction. Additionally, practitioners must pay careful attention to which LLM they select to provide assistance. The results indicate that AI LLMs are not all equal, with CoPilot performing the worst.

For practitioners, the results suggest that while LLMs can support efficiency in routine accounting tasks such as journal entries and straightforward calculations, they remain unreliable for tasks requiring professional judgement, scepticism or nuanced interpretation. Because LLMs are currently ill-suited to completing partially completed tasks, this suggests AI may struggle to identify errors in the application of IFRS principles by management teams (a key audit consideration). Consequently, practitioners should approach AI as a tool that complements, rather than replaces, human expertise. Practitioners should implement safeguards when using AI in assurance, reporting or advisory contexts. For students, the findings indicate that developing higher-order skills, including integrated thinking, critical analysis, ethical reasoning, and the ability to identify incomplete and incorrect applications of IFRS are vital to retain humans’ competitive advantage over AI. This underscores the importance of cultivating professional judgement and not relying solely on AI outputs in their studies or future practice, but approaching the output with a questioning mind.

For educators, the study highlights the need to reinforce the teaching of judgement-intensive competencies that AI cannot currently replicate. Educators must embed AI literacy within curricula so that graduates can both leverage these tools effectively and recognise their limitations. By articulating these targeted implications, the study contributes to shaping how the accounting profession adapts to technological change while safeguarding the distinctive human skills that underpin trust, accountability and professional identity. Accounting education must prioritise the development of higher-order skills such as critical thinking, ethical reasoning and professional scepticism. These are soft skills that accounting professionals are required to leverage as part of their everyday tasks (Du Toit et al., 2024b; Du Toit et al., 2024a; Jackling and De Lange, 2014). Put differently, an integrated thinking logic should be embedded as part of a modern accountancy practitioner that takes a holistic approach to risk management, strategy, execution and performance evaluation. In addition, a broader view of both the financial and extra-financial considerations, including the social and environmental perspectives, is required when assessing an organisation and its activities (Maroun et al., 2023). These are skills that AI cannot yet replicate. Nevertheless, the pace of AI integration into everyday tasks and the rapid improvements in the technological sphere necessitate AI to be embraced in learning and teaching practices (Ballantine et al., 2024).

Accounting education needs to move beyond the technical and financial focus and incorporate broader elements of integrated thinking and technological developments (Ballantine et al., 2024). The growing use of AI tools in practice suggests that students must also be equipped with AI literacy, enabling them to both use these technologies responsibly and critically to evaluate their outputs. A more balanced perspective also identifies potential risks, including the fact that an over-reliance on AI could impair students’ critical thinking, undermine their conceptual understanding or dilute their ability to exercise professional judgement if education is not carefully adapted.

The challenge for educators is threefold. Firstly, they must integrate AI into teaching in ways that enhance learning, skills development and efficiencies in the workplace. Secondly, educators must proceed with caution. If tasks where AI excels are outsourced to AI, it may lead to a gradual de-emphasis and de-skilling in essential skills (for example, journal entries and financial statement presentation) (see Imjai et al., 2025). Finally, educators must ensure that students continue to develop the distinctly human capabilities that underpin trust in the profession and differentiate them from what AI can do better.

Although improvements are required for AI to replace CA(SA)s, the role of AI in the profession and how students are educated on this cannot be ignored (Holmes and Douglass, 2022). It is up to educators to rise to the challenge to actively incorporate AI into educational curricula to assist in the development of CA(SA)s who have the ability to change and adapt to their environment as AI does (Staszkiewicz et al., 2024). Accounting curricula should encourage lifelong learning, viewing AI and technology as tools in the modern work environment rather than as a threat to livelihoods (Handoyo, 2024). Access to AI infrastructure, tools and training thereon becomes the primary responsibility of educational institutions, such as universities, due to the unequal access and exposure of the diverse South African student population (Motala, 2011).

In closing, the value of AI to accounting practitioners is rooted in its ability to enhance efficiency, improve accuracy and access insights and knowledge from global and complex data sets and sources (see Lodhia et al., 2025). Nevertheless, realising this value requires a balanced approach that integrates technological advancement with professional judgment, ethical responsibility and regulatory compliance (Roberts et al., 2024; Gendron et al., 2024). As the accounting profession continues to evolve, those who thoughtfully embrace AI as a complementary tool rather than a replacement will be best positioned to lead in a future defined by both innovation and integrity. As part of the adoption of AI by accounting professionals, it is equally important that AI be introduced into accounting curricula so that students can leverage it in their studies and be better prepared to use it in the workplace, and, more importantly, understand the associated risks and opportunities.

As with all papers, this paper has limitations. A single-shot approach was adopted. With specialised training, LLMs may be able to perform more effectively. Future research should carry out a similar investigation, but using LLM assistants specifically trained in financial accounting. Also, to maintain objectivity, exam questions were posed to the LLMs verbatim. Future research should investigate LLMs’ performance when tailored prompts are used instead of those given to students (see De Kok, 2025).

[1.]

AI is defined as information systems with the capacity to adapt to their operating environment where they have insufficient knowledge and resources (Staszkiewicz et al., 2024). For example: reasoning, learning and problem-solving. AI can process large amounts of data, recognise patterns, understand language and make predictions. Within the scope of AI, LLMs are machine learning models trained on extensive volumes of text to generate human-like responses, enabling them to engage in natural language conversations and complete tasks such as answering exam-style questions. Automation refers to the use of technology to perform routine, rules-based processes with limited or no human intervention. In an accounting context, automation encompasses functions such as data entry, reconciliations and transaction processing. These definitions provide a base for evaluating how emerging AI tools may augment or challenge traditional roles within the accountancy profession (Roberts et al., 2024).

[2.]

SAICA is a globally recognised and renowned professional body.

[3.]

The following specific search protocols were used: “artificial intelligence” or “AI” or “ChatGPT” or “Copilot” or “Grok” or “Claude” or “GEMINI” and “accounting” or “financial reporting” or “accountancy” and “education”.

[4.]

The research assumes that researchers flag the key themes of their papers in the titles, keywords and abstracts, but the risk that relevant papers have been excluded cannot be reduced to zero and, by relying on the Scopus Database, other relevant articles could be overlooked. This is not considered a material issue for the current study because the intention is not to provide a comprehensive count of the number of papers but a sense of the direction being taken by the research and key themes emerging.

[5.]

A bibliometric analysis provides an overview of the relationship, volume and impact of the research through various techniques, frequency analysis, citation analysis, authorship and country affiliation analysis. Bibliometric tools, including citation, co-citation, bibliographic-coupling and keyword co-occurrence analyses, were applied to the refined 90 academic sources.

[13.]

The AI models’ marks were compared to the average mark of 330 and 262 4th year CA(SA) students that sat for the exams in 2023 and 2024, respectively.

[14.]

All queries were run on 29 May 2025.

[15.]

The co-author is a qualified CA(SA) with a master’s degree. She is the convenor of the Financial Accounting IV program and has over 5 years’ experience at a prominent South African university.

Abdo-Salloum
,
A.M.
and
Al-Mousawi
,
H.Y.
(
2025
), “
Accounting students’ technology readiness, perceptions, and digital competence toward artificial intelligence adoption in accounting curricula
”,
Journal of Accounting Education
, Vol.
70
, p.
100951
.
Adams
,
C.A.
and
Abhayawansa
,
S.
(
2022
), “
Connecting the COVID-19 pandemic, environmental, social and governance (ESG) investing and calls for ‘harmonisation’ of sustainability reporting
”,
Critical Perspectives on Accounting
, Vol.
82
, p.
102309
.
Akerkar
,
R.
(
2019
),
Artificial Intelligence for Business
,
Springer
.
Al Shehab
,
N.A.
(
2022
), “Under the COVID pandemic: is it the springtime for forensic accounting field to blossom?”,
Artificial Intelligence and COVID Effect on Accounting
,
Springer
.
Albuquerque
,
F.
and
Gomes DOS Santos
,
P.
(
2024
), “
Can ChatGPT Be a certified accountant? Assessing the responses of ChatGPT for the professional access exam in Portugal
”,
Administrative Sciences
, Vol.
14
No.
7
, p.
152
.
Anomah
,
S.
,
Ayeboafo
,
B.
,
Owusu
,
A.
and
Aduamoah
,
M.
(
2024
), “
Adapting to AI: exploring the implications of AI integration in shaping the accounting and auditing profession for developing economies
”,
EDPACS
, Vol.
69
No.
11
, pp.
28
-
52
.
Appelbaum
,
D.
,
Showalter
,
D.S.
,
Sun
,
T.
and
Vasarhelyi
,
M.A.
(
2021
), “
A framework for auditor data literacy: a normative position
”,
Accounting Horizons
, Vol.
35
No.
2
, pp.
5
-
25
.
Arise
,
O.A.
and
Moloi
,
T.
(
2025
), “
Towards a Future-Ready accounting education: Opportunities and challenges of integrating artificial intelligence into contemporary accounting curricula
”,
International Conference of Accounting and Business
,
Springer
,
993
-
1015
.
Assidi
,
S.
,
Omran
,
M.
,
Rana
,
T.
and
Borgi
,
H.
(
2025
), “
The role of AI adoption in transforming the accounting profession: a diffusion of innovations theory approach
”,
Journal of Accounting and Organizational Change
, Vol.
21
No.
5
, doi: .
Atanasovski
,
A.
,
Tocev
,
T.
,
Dionisijev
,
I.
,
Minovski
,
Z.
and
Jovevski
,
D.
(
2023
),
Evaluating the Performance of ChatGPT in Accounting and Auditing Exams: An Experimental Study in North Macedonia
,
Faculty of Economics-Skopje, Ss. Cyril and Methodius University in Skopje
,
Skopje
.
Baldwin-Morgan
,
A.A.
(
1995
), “
Integrating artificial intelligence into the accounting curriculum
”,
Accounting Education
, Vol.
4
No.
3
, pp.
217
-
229
.
Ballantine
,
J.
,
Boyce
,
G.
and
Stoner
,
G.
(
2024
), “
A critical review of AI in accounting education: Threat and opportunity
”,
Critical Perspectives on Accounting
, Vol.
99
, p.
102711
.
Bellucci
,
M.
,
Marzi
,
G.
,
Orlando
,
B.
and
Ciampi
,
F.
(
2020
), “
Journal of intellectual capital: a review of emerging themes and future trends
”,
Journal of Intellectual Capital
, Vol.
22
No.
4
, pp.
744
-
767
.
Boateng
,
S.L.
and
Boateng
,
R.
(
2025
),
AI and Society: Navigating Policy, Ethics, and Innovation in a Transforming World
,
CRC Press
.
Brookes
,
L.
,
Burnham
,
K.
,
VAN Zijl
,
W.
and
Maroun
,
W.
(
2025
), “
Financial reporting challenges of small-and medium-capitalisation JSE-listed companies
”,
South African Journal of Business Management
, Vol.
56
No.
1
,
Caputo
,
A.
,
Pizzi
,
S.
,
Pellegrini
,
M.
and
Dabić
,
M.
(
2021
), “
Digitalization and business models: Where are we going? A science map of the field
”,
Journal of Business Research
, Vol.
123
, pp.
489
-
501
.
Coyne
,
J.G.
,
Coyne
,
E.M.
and
Walker
,
K.B.
(
2016
), “
A model to update accounting curricula for emerging technologies
”,
Journal of Emerging Technologies in Accounting
, Vol.
13
No.
1
, pp.
161
-
169
.
Dahl
,
M.
,
Magesh
,
V.
,
Suzgun
,
M.
and
Ho
,
D.E.
(
2024
), “
Large legal fictions: Profiling legal hallucinations in large language models
”,
Journal of Legal Analysis
, Vol.
16
No.
1
, pp.
64
-
93
.
DE Kok
,
T.
(
2025
), “
ChatGPT for textual analysis? How to use generative LLMs in accounting research
”,
Management Science
, Vol.
71
No.
9
, doi: .
DE Villiers
,
R.
(
2021
), “
Seven principles to ensure future-ready accounting graduates–a model for future research and practice
”,
Meditari Accountancy Research
, Vol.
29
No.
6
, pp.
1354
-
1380
.
Dong
,
M.M.
,
Stratopoulos
,
T.C.
and
Wang
,
V.X.
(
2024
), “
A scoping review of ChatGPT research in accounting and finance
”,
International Journal of Accounting Information Systems
, Vol.
55
, p.
100715
.
Du Toit
,
E.
,
Marx
,
B.
and
Smith
,
R.
(
2024a
), “A model to develop integrated thinking skills of prospective professional accountants”,
Advances in Accounting Education: Teaching and Curriculum Innovations
,
Emerald Publishing Limited
.
Du Toit
,
E.
,
Marx
,
B.
and
Smith
,
R.J.
(
2024b
), “
Barriers to the development of integrated thinking skills of prospective chartered accountants
”,
South African Journal of Economic and Management Sciences
, Vol.
27
No.
1
, p.
5325
.
Dumay
,
J.
,
Bernardi
,
C.
,
Guthrie
,
J.
and
Demartini
,
P.
(
2016
), “
Integrated reporting: a structured literature review
”,
Accounting Forum
, Vol.
40
No.
3
, pp.
166
-
185
.
Ecim
,
D.
and
Maroun
,
W.
(
2023
), “
A review of integrated thinking research in developed and developing economies
”,
Journal of Accounting in Emerging Economies
, Vol.
13
No.
3
, pp.
589
-
612
.
Ecim
,
D.
(
2024
), “
Components of integrated thinking: Evidence from South African listed companies
”,
South African Journal of Business Management
, Vol.
55
No.
1
, p.
16
.
Eisikovits
,
N.
,
Johnson
,
W.C.
and
Markelevich
,
A.
(
2025
), “
Should accountants be afraid of AI? Risks and opportunities of incorporating artificial intelligence into accounting and auditing
”,
Accounting Horizons
, Vol.
39
No.
2
, pp.
117
-
123
.
Elliott
,
R.K.
(
1986
), “
Auditing in the 1990s: Implications for education and research
”,
California Management Review
, Vol.
28
No.
4
, pp.
89
-
97
.
Elnakeeb
,
S.
and
Elawadly
,
H.S.H.
(
2025
), “
Automation and artificial intelligence in accounting: a comprehensive bibliometric analysis and future trends
”,
Journal of Financial Reporting and Accounting
, doi: .
Emsley
,
R.
(
2023
), “
ChatGPT: these are not hallucinations–they’re fabrications and falsifications
”,
Schizophrenia
, Vol.
9
No.
1
, p.
52
.
Enholm
,
I.M.
,
Papagiannidis
,
E.
,
Mikalef
,
P.
and
Krogstie
,
J.
(
2022
), “
Artificial intelligence and business value: a literature review
”,
Information Systems Frontiers
, Vol.
24
No.
5
, pp.
1709
-
1734
.
Eulerich
,
M.
,
Sanatizadeh
,
A.
,
Vakilzadeh
,
H.
and
Wood
,
D.A.
(
2024
), “
Is it all hype? ChatGPT’s performance and disruptive potential in the accounting and auditing industries
”,
Review of Accounting Studies
, Vol.
29
No.
3
, pp.
2318
-
2349
.
EUROPEAN COMMISSION
(
2024
), “
Corporate sustainability reporting
”,
available at:
Link to Corporate sustainability reportingLink to the cited article. (
accessed
1 February 2024).
Gendron
,
Y.
,
Andrew
,
J.
,
Cooper
,
C.
and
Tregidga
,
H.
(
2024
),
On the Juggernaut of Artificial Intelligence in Organizations, Research and Society
,
Elsevier
.
Handoyo
,
S.
(
2024
), “
Evolving paradigms in accounting education: a bibliometric study on the impact of information technology
”,
The International Journal of Management Education
, Vol.
22
No.
3
, p.
100998
.
Holmes
,
A.F.
and
Douglass
,
A.
(
2022
), “
Artificial intelligence: Reshaping the accounting profession and the disruption to accounting education
”,
Journal of Emerging Technologies in Accounting
, Vol.
19
No.
1
, pp.
53
-
68
.
Imjai
,
N.
,
Yordudom
,
T.
,
Yaacob
,
Z.
,
Saad
,
N.H.M.
and
Aujirapongpan
,
S.
(
2025
), “
Impact of AI literacy and adaptability on financial analyst skills among prospective thai accountants: the role of critical thinking
”,
Technological Forecasting and Social Change
, Vol.
210
, p.
123889
.
IODSA
(
2016
),
King IV Report on Corporate Governance in South Africa
,
Lexis Nexis South Africa
,
Johannesburg, South Africa
.
Issa
,
H.
,
Sun
,
T.
and
Vasarhelyi
,
M.A.
(
2016
), “
Research ideas for artificial intelligence in auditing: the formalization of audit and workforce supplementation
”,
Journal of Emerging Technologies in Accounting
, Vol.
13
No.
2
, pp.
1
-
20
.
ISSB
(
2021
), “
General requirements for disclosure of sustainability-related financial information prototype
”,
available at:
Link to General requirements for disclosure of sustainability-related financial information prototypeLink to a PDF of the cited article.
Jackling
,
B.
and
DE Lange
,
P.
(
2014
), “Do accounting graduates’ skills meet the expectations of employers? A matter of convergence or divergence”,
The Interface of Accounting Education and Professional Training
,
Routledge
.
Katz
,
J.E.
(
2024
), “
Artificial intelligence in education as a threat to the democratic imagination
”,
Journal of Artificial Intelligence for Sustainable Development
, Vol.
1
No.
1
, pp.
1
-
6
.
Kokina
,
J.
and
Davenport
,
T.H.
(
2017
), “
The emergence of artificial intelligence: How automation is changing auditing
”,
Journal of Emerging Technologies in Accounting
, Vol.
14
No.
1
, pp.
115
-
122
.
Leitner-Hanetseder
,
S.
,
Lehner
,
O.M.
,
Eisl
,
C.
and
Forstenlechner
,
C.
(
2021
), “
A profession in transition: actors, tasks and roles in AI-based accounting
”,
Journal of Applied Accounting Research
, Vol.
22
No.
3
, pp.
539
-
556
.
Lodhia
,
S.
,
Farooq
,
M.B.
,
Sharma
,
U.
and
Zaman
,
R.
(
2025
), “
Digital technologies and sustainability accounting, reporting and assurance: framework and research opportunities
”,
Meditari Accountancy Research
, Vol.
33
No.
2
, pp.
417
-
441
.
Lu
,
Y.
(
2019
), “
Artificial intelligence: a survey on evolution, models, applications and future trends
”,
Journal of Management Analytics
, Vol.
6
No.
1
, pp.
1
-
29
.
Mcguigan
,
N.
,
Haustein
,
E.
,
Kern
,
T.
and
Lorson
,
P.
(
2021
), “
Thinking through the integration of corporate reporting: Exploring the interplay between integrative and integrated thinking
”,
Meditari Accountancy Research
, Vol.
29
No.
4
, pp.
775
-
804
.
Maroun
,
W.
(
2022
), “
Corporate governance and the use of external assurance for integrated reports
”,
Corporate Governance: An International Review
, Vol.
30
No.
5
, pp.
584
-
607
.
Maroun
,
W.
,
Ecim
,
D.
and
Cerbone
,
D.
(
2023
), “
Refining integrated thinking
”,
Sustainability Accounting, Management and Policy Journal
, Vol.
14
No.
7
, pp.
1
-
25
.
Motala
,
S.
(
2011
), “
Educational access in South Africa
”,
Journal of Educational Studies
, Vol.
2011
, pp.
84
-
103
.
Myeza
,
L.
,
Ecim
,
D.
and
Maroun
,
W.
(
2023
), “
The role of integrated thinking in corporate governance during the COVID-19 crisis: perspectives from South Africa
”,
Journal of Public Budgeting, Accounting and Financial Management
, Vol.
35
No.
6
, pp.
52
-
77
.
Patel
,
S.
and
Ragolane
,
M.
(
2024
), “
The implementation of artificial intelligence in South African higher education institutions: Opportunities and challenges
”,
Technium Education and Humanities
, Vol.
9
, pp.
51
-
65
.
Pinto
,
A.S.
,
Abreu
,
A.
,
Costa
,
E.
and
Paiva
,
J.
(
2024
), “
AI in accounting: Can AI models like ChatGPT and Gemini successfully pass the Portuguese chartered accountant exam?
”,
International Conference in Information Technology and Education
,
Springer
, pp.
429
-
438
.
Polimeni
,
R.S.
and
Burke
,
J.A.
(
2021
), “
Integrating emerging accounting digital technologies and analytics into an undergraduate accounting curriculum—a case study
”,
Journal of Emerging Technologies in Accounting
, Vol.
18
No.
1
, pp.
159
-
173
.
Ribeiro
,
J.
,
Lima
,
R.
,
Eckhardt
,
T.
and
Paiva
,
S.
(
2021
), “
Robotic process automation and artificial intelligence in industry 4.0–a literature review
”,
Procedia Computer Science
, Vol.
181
, pp.
51
-
58
.
Rinaldi
,
L.
,
Unerman
,
J.
and
DE Villiers
,
C.
(
2018
), “
Evaluating the integrated reporting journey: insights, gaps and agendas for future research
”,
Accounting, Auditing and Accountability Journal
, Vol.
31
No.
5
, pp.
1294
-
1318
.
Roberts
,
J.
,
Baker
,
M.
and
Andrew
,
J.
(
2024
), “
Artificial intelligence and qualitative research: the promise and perils of large language model (LLM)‘assistance
”,
Critical Perspectives on Accounting
, Vol.
99
, p.
102722
.
Staszkiewicz
,
P.
,
Horobiowski
,
J.
,
Szelągowska
,
A.
and
Strzelecka
,
A.M.
(
2024
), “
Artificial intelligence legal personality and accountability: auditors’ accounts of capabilities and challenges for instrument boundary
”,
Meditari Accountancy Research
, Vol.
32
No.
7
, pp.
120
-
146
.
Tharapos
,
M.
(
2022
), “
Opportunity in an uncertain future: Reconceptualising accounting education for the post-COVID-19 world
”,
Accounting Education
, Vol.
31
No.
6
, pp.
640
-
651
.
VAN Eck
,
N.J.
and
Waltman
,
L.
(
2017
), “
Citation-based clustering of publications using CitNetExplorer and VOSviewer
”,
Scientometrics
, Vol.
111
No.
2
, pp.
1053
-
1070
.
Vasarhelyi
,
M.A.
,
Moffitt
,
K.C.
,
Stewart
,
T.
and
Sunderland
,
D.
(
2023
), “
Large language models: an emerging technology in accounting
”,
Journal of Emerging Technologies in Accounting
, Vol.
20
No.
2
, pp.
1
-
10
.
Vosoughi
,
S.
,
Roy
,
D.
and
Aral
,
S.
(
2018
), “
The spread of true and false news online
”,
Science
, Vol.
359
No.
6380
, pp.
1146
-
1151
.
White
,
C.E.
JR
, (
1995
), “
An analysis of the need for ES and AI in accounting education
”,
Accounting Education
, Vol.
4
No.
3
, pp.
259
-
269
.
Wood
,
D.A.
,
Achhpilia
,
M.P.
,
Adams
,
M.T.
,
Aghazadeh
,
S.
,
Akinyele
,
K.
,
Akpan
,
M.
,
Allee
,
K.D.
,
Allen
,
A.M.
,
Almer
,
E.D.
,
Ames
,
D.
and
Arity
,
V.
(
2023
), “
The ChatGPT artificial intelligence chatbot: How well does it answer accounting assessment questions?
”,
Issues in Accounting Education
, Vol.
38
No.
4
, pp.
81
-
108
.
Zacher
,
W.
and
Kuppannagari
,
S.
(
2024
), “
Can LLMs pass the CPA exam? Evaluating large language model performance on the certified public accountant test
”,
SSRN Electronic Journal
, doi: .
Zhao
,
J.
and
Wang
,
X.
(
2024
), “
Unleashing efficiency and insights: Exploring the potential applications and challenges of ChatGPT in accounting
”,
Journal of Corporate Accounting and Finance
, Vol.
35
No.
1
, pp.
269
-
276
.
Zong
,
Z.
and
Guan
,
Y.
(
2025
), “
AI-driven intelligent data analytics and predictive analysis in industry 4.0: Transforming knowledge, innovation, and efficiency
”,
Journal of the Knowledge Economy
, Vol.
16
No.
1
, pp.
864
-
903
.
Zupic
,
I.
and
Čater
,
T.
(
2015
), “
Bibliometric methods in management and organization
”,
Organizational Research Methods
, Vol.
18
No.
3
, pp.
429
-
472
.
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licenceLink to the terms of the CC BY 4.0 licence

or Create an Account

Close Modal
Close Modal