The study aims to investigate how three artificial intelligence (AI) tools – ChatGPT, Gamma and Autopilot – can assist university students in improving their academic writing tasks.
This one-group pretest/posttest study examined the impact of three AI tools, ChatGPT, Gamma and Autopilot, on students’ academic performance. Students used one tool for assignments, supported by pre- and posttests, surveys and reflections. Quantitative data was analyzed using descriptive statistics, paired t-tests, ANOVA and correlations. Qualitative data was collected from reflections and group discussions. This study was guided by the Technology Acceptance Model and Constructivist Learning Theory, with reflexive notes included to minimize researcher bias.
The results showed that AI tools enhanced ease of use among students and reduced the time required to complete tasks, with ChatGPT scoring the highest in terms of usability. Gamma proved more accurate due to its customizable output formats, whereas Autopilot was less flexible and simpler. Weaknesses included difficulties in adherence to references and limited support for the Arabic language, especially with ChatGPT. Students appreciated the time-saving and improved presentation qualities of the tools, but noted a need for more precise references and better support for non-English languages. These statistical findings were reinforced by qualitative reflections, which provided insights into how students justified their tool choices and what they found weak about them, particularly regarding citations and support for Arabic.
Because this study is exploratory and conducted at one university, it cannot be generalized to the larger population. Participants were not randomly assigned to conditions, and the absence of a control group makes it difficult to isolate the specific effects of AI tools. Future research could expand on these findings by including a more representative sample and a control group to better inform educators and researchers about the impact of AI tool use.
AI tools like Autopilot, Gamma and ChatGPT are valuable resources that increase productivity and simplify the research process. Responsible use of these tools is essential, however, to ensure that academic standards and citation practices are properly maintained. Following these standards enables students to benefit from the time-saving features of AI tools without compromising the quality or integrity of their work.
This study explores the potential of AI tools to support academic work, an underresearched area. It offers new perspectives on how AI can enhance academic support by examining their real-world applications in higher education. The results highlight AI’s potential to raise the effectiveness and efficiency of educational outcomes.
1. Introduction
Artificial intelligence (AI) increasingly impacts our lives, transforming how we work, learn and communicate (Long and Magerko, 2020). AI and its tools are among the most significant outcomes of the Fourth Industrial Revolution (Sahai and Rath, 2021). This initiative, launched in Germany, aims to rely on technology in industry while reducing labor, restricting its role to monitoring and auditing. As of 2024, AI chatbots like GPT4o, Gemini 1.5 Pro and Claude 3, harnessing deep learning algorithms, are at the forefront of technological advancements (Kevian et al., 2024).
This technological revolution, represented by AI, has spread widely and plays a crucial role in all sectors of life, particularly in education. The ability to complete complex tasks in seconds and provide answers in a language that is both understandable and relatable makes AI tools a significant innovation in processing (Lund and Wang, 2023). This integration is beneficial due to its essential characteristics, including speed, efficiency in saving time and money and accuracy in content presentation, which significantly impacts university students’ learning (Slimi, 2022).
As AI becomes increasingly prevalent in the learning environment, researchers and students must guide its growth, control its development, minimize bias and adhere to strict ethical standards. It is essential to remember that critical thinking and human creativity remain vital and represent a fundamental responsibility of researchers (Bolaños et al., 2024). Recently, the need to align AI with diverse human values has emerged, emphasizing fairness, accountability and transparency (FAccT) as the foundation for developing ethical AI. Addressing bias in AI does not rely only on technical solutions but instead requires a comprehensive reassessment of the social and technical systems within which these technologies are developed and implemented (Mergen et al., 2025).
Many definitions of AI have emerged, with one of the most notable being from Sheikh et al. (2023): “The ability of a machine to perform and act like humans or engage in actions requiring intelligence through systems that use technologies capable of generating content, providing recommendations, or making decisions at varying levels of self-control.” A subset of AI consists of algorithms capable of generating new and diverse content, such as images and text (Cao et al, 2023). Currently, generative AI tools are gaining popularity, particularly for automated text generation (ATG), a trend driven mainly by the release of ChatGPT by OpenAI in November 2022. This release of ChatGPT for public use was a significant milestone as it not only showcased the theoretically game-changing potential of AI, and especially that of large language models (LLM), to a broader audience but also attracted widespread public attention with the tool reaching over one million users within one week of its release (Wu et al., 2023).
Given the role these systems play in their widespread adoption across various disciplines, their prevalence has increased significantly in recent times. In education, the use of AI has become common in supporting teachers and students to carry out numerous educational tasks related to lesson preparation, individual and group assignments, and the impact of AI on higher education is increasing (Delcker et al., 2024). Moreover, free AI-based programs and systems that assist in content creation or decision-making have significantly increased, paralleling human intelligence. For example, ChatGPT is increasingly recognized as a powerful writing assistant in the scientific community with the potential to transform academic and scholarly publishing (Haque et al., 2022).
Despite the capacity of AI to perform tasks that involve creativity, innovation and enhanced quality, thereby increasing both capabilities and potential, there remains a degree of ambiguity regarding high-level expectations, which are sometimes unrealistic. As AI technology advances, tools like ChatGPT will likely become more sophisticated and reliable. However, it is essential to maintain a balanced approach by supplementing AI with human expertise and judgment. The goal should be to integrate the strengths of both technology and human insight to create engaging, practical learning experiences that empower students with the necessary skills to thrive in an increasingly digital world (Sykes, 2024).
This study will be based on the selection of ChatGPT, Gamma and Autopilot for the following reasons:
The ease of use and application of these tools by the student sample, particularly concerning the required activation steps, and the availability of the tools in Arabic, which is less common than many other tools primarily operating in English.
These free tools allow students to access and use them freely. AI tools in Microsoft Office are provided to university students at no charge. Gamma is also free to use, particularly for creating presentations with a limited number of slides, making it highly accessible.
These tools offer strong security and privacy, particularly those integrated into Microsoft Office. Students can access and use them without entering login credentials or providing personal details, such as their name and age. This structure helps ensure a high level of data protection and confidentiality.
The study results may apply to similar AI tools that are free to use, include an Arabic language interface and provide privacy and security.
This research aims to demonstrate that AI performance expectations often exceed the actual outcomes, particularly concerning student skills, academic achievements in completing assignments and improved academic writing. These tools must adapt to various academic citation patterns, support underrepresented languages such as Arabic and provide more personalized learning experiences.
Achieving this vision requires addressing ongoing challenges, such as concerns about academic integrity, unequal access to digital resources and linguistic inclusion. AI adoption should be approached with careful planning and ethical foresight to avoid the risk of deepening existing educational inequities rather than addressing them. Therefore, it should be viewed as a transformative process guided by inclusive, ethical and future-oriented educational frameworks.
2. Study problem
This study examines the impact of AI tools such as ChatGPT, Gamma and Autopilot on the quality of student work. Many students use these tools to support research, writing and time management. However, it remains unclear whether these tools enhance the quality of academic work. The study focuses on several key aspects. Specifically, it explores how these AI tools impact assignments in terms of usability, accuracy and citation Adherence. It will also examine how effectively students apply proper citations. The study examines how these tools manage citations and whether students can locate relevant and specific sources, a skill crucial for assignments requiring exact references. By examining both the pros and cons of using AI tools in academic tasks, the study aims to demonstrate how they may be effectively integrated into student work. It will also seek to identify areas where AI tools require improvement to support diverse languages and meet academic standards, thereby enhancing the quality of student academic performance.
3. Research objectives
To examine the impact of AI tools like Gamma, ChatGPT and Autopilot on student performance in academic assignments.
To evaluate student perceptions of the effectiveness of these tools in improving their academic output and productivity.
4. Research questions
What is the impact of using AI tools, such as Gamma, ChatGPT and Autopilot, on student performance in academic assignments?
What are students’ perceptions and feedback on the use of AI tools, Gamma, ChatGPT and Autopilot in academic tasks?
5. Review of literature
Interest in integrating AI technologies into higher education contexts continues to grow (Crompton and Burke, 2023). Further study and experimentation are necessary, particularly regarding the effectiveness of specific AI tools, such as Gamma, ChatGPT and Autopilot, in academic projects, as well as their influence on academic writing standards for university students.
AI tools have become increasingly valuable for academic writing, as they can assist with various aspects of the writing process. Recent international studies indicate that AI technology can enhance productivity by generating ideas, correcting grammatical errors and organizing references. This improves writing flow and enhances coherence (Kasneci et al., 2023; Zawacki-Richter et al., 2024). These technologies are particularly advantageous for students who speak English as a second language, as they help overcome linguistic and temporal barriers (Jiao and Lin, 2024). Holmes and Tuomi (2023) emphasized that AI can support students’ writing by providing immediate feedback and language guidance; however, it should not replace human instruction and critical thinking. Dawson and Luckin (2024) found that students in Australia and the UK view generative AI as a valuable tool for enhancing and improving the structural quality of academic writing; however, concerns remain about potential overreliance on such technologies, which may impede students’ independent thinking.
Recent studies in North America have corroborated these concerns, noting that students may not fully grasp the principles of authorship, citation practices and academic integrity if they rely too heavily on AI technology (Cotton et al., 2024; Zhang and Yueh, 2024). These studies suggest that AI can enhance writing and learning outcomes when used appropriately, while preserving key aspects of human creativity, feedback and ethical responsibility.
Tran (2023) demonstrated the influence of AI-powered tools on education, particularly in enhancing English academic writing skills. The study surveyed 5 teachers and 60 students in Hanoi, to investigate how these tools affect teaching and learning. The results indicated positive perceptions among both teachers and students, with AI tools significantly enhancing coherence, lexical diversity, grammatical accuracy and overall writing quality. Moreover, the integration of AI-powered writing tools proved highly beneficial, offering valuable insights that strengthen academic writing pedagogy and preparation for standardized English assessments.
Further research is needed to explore other specific tools, such as Autopilot, ChatGPT and Gamma, in the context of academic writing tasks. Although previous studies have examined the use of AI tools in academic writing, including ChatGPT, more comprehensive research is still required. These resources offer insightful information on leveraging AI in higher education to enhance academic standards and promote student learning.
6. Theoretical framework
The research model is based on the Technology Acceptance Model (TAM), which identifies ease of use and usefulness perceptions as the two crucial factors that determine technology adoption (Jiang, 2025). These constructions are closely related to usability, accuracy and time saving, which are the focus of this study. In addition, the Constructivist Learning Theory (CLT) provides a perspective on how students actively form meaning and knowledge when interacting with AI-generated products, particularly in Arabic-language courses (Wang, 2025). These frames help explain why students prefer tools that are simple to use and consume less time, even if they have reduced accuracy or must adhere to citation requirements.
7. Study design
7.1 Research methodology
This study followed a one-group pretest and posttest design to examine how three AI tools, ChatGPT, Gamma and Autopilot, could enhance students’ academic performance. The research was conducted at a private, medium-sized university in Qatar that offers bilingual programs. Arabic-medium courses are a central part of the curriculum, providing meaningful context to explore how students engage with AI tools in their native language.
A total of 92 undergraduate students were included in the study. The sample size was determined by the availability of eligible participants during the recruitment period. Although relatively small, this sample was sufficient for the exploratory purpose of the research. They represent the Colleges of Law, Education and Business. They were enrolled in two university requirement courses, “University Success” and “Leadership, Innovation, and Entrepreneurship,” offered in Arabic during the Fall 2024 semester. The study compared students’ perceptions and performance before and after using the AI tools. Each group of students used one of the tools for selected course assignments. Data was collected through pre- and posttests, surveys and short written reflections.
The study was conducted by the TAM and CLT. TAM helped explain students’ willingness to adopt AI based on perceived ease of use and usefulness. At the same time, CLT provided a framework for understanding how students constructed knowledge through interaction with AI in their learning process. Because the researcher was familiar with AI technologies, reflective notes were kept throughout the study to reduce potential bias in interpreting the data.
Quantitative data was analyzed using descriptive statistics, paired-sample t-tests, ANOVA and Pearson correlations. Qualitative reflections and group discussions were analyzed thematically to identify recurring ideas and variations in students’ experiences. Coding was done inductively and refined through several rounds of review to ensure consistency and depth in the interpretation.
The study’s main limitation lies in its focus on one institution and a relatively small sample drawn from Arabic-medium courses, which limits the generalizability of the findings. Nevertheless, the results provide valuable insight into how AI tools can be integrated into similar higher education contexts.
7.2 Context of the study
The study’s setting is important, as all courses were conducted in Arabic. This linguistic context also affected the result because most AI tools are more effective in English and have limitations in processing Arabic input. Discovering these variations was the primary focus of the research, as language assistance significantly affects a tool’s usability and accuracy in an academic context. In addition, the study was conducted in several colleges, which ensured a diverse range of opinions, thereby enhancing the applicability of the findings to other fields.
7.3 Limitations of self-report
One major weakness of this methodology is that it relies on self-reported information gathered through surveys and reflections. Social desirability, recall errors and subjectivity are some of the biases that affect self-reports. Consequently, the ease of use or accuracy of students’ perceptions might not always align with objective measures. To address this, the survey data was supplemented with assignment assessments and statistical tests; however, the possibility of self-report bias is also acknowledged as a limitation.
7.4 Sample
The study involved 92 students from three colleges – Law, Education and Business – to ensure a diverse range of perspectives. The participants were selected from both male and female students (see Table 1). All students were enrolled in the university’s requirements courses, “University Success” and “Leadership, Innovation and Entrepreneurship,” which were conducted in Arabic during the Spring 2024 academic semester. Regarding the sample size, it was not possible to have a large group due to the limited number of students in the targeted classrooms.
Demographic information
| Demographic variable | Categories | Frequency (n) | % |
|---|---|---|---|
| Age | 20–24 years | 65 | 70.65 |
| 25–29 years | 10 | 10.86 | |
| 30-34 years | 7 | 7.60 | |
| 35 + years | 10 | 10.86 | |
| Gender | Male | 46 | 50.00 |
| Female | 46 | 50.00 | |
| Other | 0 | 0.00 |
| Demographic variable | Categories | Frequency (n) | % |
|---|---|---|---|
| Age | 20–24 years | 65 | 70.65 |
| 25–29 years | 10 | 10.86 | |
| 30-34 years | 7 | 7.60 | |
| 35 + years | 10 | 10.86 | |
| Gender | Male | 46 | 50.00 |
| Female | 46 | 50.00 | |
| Other | 0 | 0.00 |
7.5 Study tools
The study used three AI tools – Gamma, ChatGPT and Autopilot – which were selected for their user-friendliness, Arabic availability and free access. These tools were specifically selected to provide the students with straightforward and practical resources for their assignments. Gamma was included because of its ease of use and because it is a program students can access for free. ChatGPT and Autopilot were selected for their content-generation capabilities, popularity in academic environments and privacy features that do not require any personal information or login details. By incorporating these tools, the study aims to evaluate AI resources that are accessible, secure and supportive of Arabic language users.
7.6 Data collection
7.6.1 Students’ survey.
A combination of closed-ended and open-ended questions will be used to gather data. Using a three-point Likert scale, the survey was developed based on insights from the literature, including studies by Chan and Hu (2023) and Almogren et al. (2024). In addition, the validity was evaluated by specialists in the field, who recommended modifying specific sentences or phrases, deleting others and rearranging some items. The surveys were administered online to ensure accessibility and convenience for participants. Participants were provided with a clear explanation of the study’s purpose, and participation was voluntary. Information consent was obtained from all participants before they began the test and the survey, ensuring that ethical standards were followed throughout the research process. The integration of both quantitative and qualitative approaches provided a well-rounded understanding of the AI tools experience and actionable insights into how AI tool management can be improved. This approach reasonably approximated face validity, and most respondents agreed that the items were appropriate for the scale; thus, the scale’s content validity was achieved. The survey will be designed to assess:
time and effort savings;
perceptions of accuracy, multimedia use and citation adherence;
language support, particularly for Arabic; and
challenges faced and the overall effectiveness of the tools.
7.6.2 Assignments test.
Students used the selected AI tools to complete the writing assignments. The impact of the tools was measured through qualitative and quantitative evaluation of the final outputs.
The researcher prepared the assignment rubrics. After preparing the test rubric, a group of specialists in the field shared their views on the accuracy and clarity of the proposed test and rubric. The data was collected from one course with an enrollment of over 226 students in a multi-section format, including University Success, leadership, Innovation and Entrepreneurship courses. The test was submitted in the classroom, and the student survey was placed on Microsoft Forms. Participants were given a consent form that ensured confidentiality, voluntary participation and the right to withdraw from the study at any time without consequence, as well as information regarding safety, storage and contact with researchers. All participants were assured of anonymity and confidentiality of the information collected.
8. Data analysis
In this study, data was collected from three primary sources: test results from student assignments, a table analyzing the performance of AI tools and a survey containing both closed and open-ended questions. The data was analyzed using a mix of quantitative and qualitative methods to evaluate the effectiveness of AI tools (Gamma, ChatGPT and Autopilot) in enhancing students’ academic performance.
8.1 Test results analysis
The first step in the analysis was to examine the test results from the students’ assignments. The students were assessed on their ability to complete research tasks, write academic content and follow proper citation practices. Descriptive statistics were used to summarize the performance of students using different AI tools. The average test scores for students who used each AI tool were compared to determine any significant differences in performance.
To ensure the reliability and credibility of the test and table analysis, the researcher used established methods from the literature and conducted Cronbach’s alpha reliability test, with results indicating acceptable internal consistency. A one-way ANOVA test was conducted to assess if there were statistically significant differences in the mean scores.
The data analysis focused on three elements:
Ease of use: After the intervention, more students rated ease of use higher, with a shift toward “Two marks.” The mean score increased from 30.66 to 32, indicating improvement, though the increased standard deviation (from 13.05 to 19.39) suggests more varied responses. A strong correlation (0.943) indicated consistent trends in ratings (see Table 2).
Accuracy: Accuracy ratings shifted toward “five marks,” but the mean remained the same. The negative correlation (−0.686) indicated mixed results, with some students rating accuracy lower postintervention (see Table 2).
Citation adherence: Citation adherence improved, with more students rating it “three” and “Four marks.” The mean increased slightly, but the weak correlation (0.165) suggests inconsistent results (see Table 2).
Performance metrics and statistical analysis of AI tools in academic assignments
| Variables | n = 92(%) | Change (%) | Mean | SD | Median | Correlation | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | Before | After | |||
| Ease of use (user interaction, accessibility, and ease of setup) | 17 (18.4) | 18 (19.5) | −0.058 | 30.66 | 30.66 | 13.05 | 19.39 | 32 | 21 | 0.943 |
| 43 (46.7) | 21 (22.8) | 0.511 | ||||||||
| 32 (34.7) | 53 (57.6) | −0.656 | ||||||||
| Accuracy (precision of results, consistency, relevance to the main topic and multimedia compatibility with content) | 28 (30.4) | 21 (22.8) | 0.25 | 30.66 | 30.66 | 7.37 | 17.61 | 28 | 21 | −0.686 |
| 39 (42.3) | 20 (24.7) | 0.487 | ||||||||
| 25 (24.1) | 51 (55.4) | −1.04 | ||||||||
| Citation adherence (correctness of citation style, completeness of citations, placement and formatting, and use of reliable and relevant sources) | 26 (28.2) | 12 (13.0) | 0.538 | 30.66 | 30.66 | 4.50 | 16.28 | 31 | 38 | 0.165 |
| 35 (38.0) | 42 (45.6) | −0.2 | ||||||||
| 31 (33.6) | 38 (41.3) | −0.225 | ||||||||
| Variables | n = 92(%) | Change (%) | Mean | SD | Median | Correlation | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | Before | After | |||
| Ease of use (user interaction, accessibility, and ease of setup) | 17 (18.4) | 18 (19.5) | −0.058 | 30.66 | 30.66 | 13.05 | 19.39 | 32 | 21 | 0.943 |
| 43 (46.7) | 21 (22.8) | 0.511 | ||||||||
| 32 (34.7) | 53 (57.6) | −0.656 | ||||||||
| Accuracy (precision of results, consistency, relevance to the main topic and multimedia compatibility with content) | 28 (30.4) | 21 (22.8) | 0.25 | 30.66 | 30.66 | 7.37 | 17.61 | 28 | 21 | −0.686 |
| 39 (42.3) | 20 (24.7) | 0.487 | ||||||||
| 25 (24.1) | 51 (55.4) | −1.04 | ||||||||
| Citation adherence (correctness of citation style, completeness of citations, placement and formatting, and use of reliable and relevant sources) | 26 (28.2) | 12 (13.0) | 0.538 | 30.66 | 30.66 | 4.50 | 16.28 | 31 | 38 | 0.165 |
| 35 (38.0) | 42 (45.6) | −0.2 | ||||||||
| 31 (33.6) | 38 (41.3) | −0.225 | ||||||||
Overall, the intervention improved ease of use and citation adherence but had mixed effects on accuracy, with some variability in the results.
To ensure a reliable analysis, the researcher developed, as seen in Table 3, drawing on established methodologies from prior studies. Notably, Amarathunga (2024) used similar tables to compare AI tools in educational contexts, and the table design was further inspired by Crompton and Burke (2023), whose work provided comprehensive evaluations of AI’s impact on academic performance. These sources played a critical role in shaping tables that accurately reflect both the effectiveness of the tools and students’ perceptions.
Analysis of AI tools’ performance based on various aspects observed during the students’ experiment
| Aspects | GAMMA AI | ChatGPT | Autopilot |
|---|---|---|---|
| Elements | Highlighted key points; improved research and critical thinking | Highlighted key points; enhanced research and thinking | Focused on the topic, improved cognitive skills |
| Sources and references | As shown by the search terms, the citation format is absent | Shown by search terms; no proper citation | Shown by search terms; no citation style used |
| Overall format | Used modern formats with visuals that matched the content | Used some modern formats suited to content | Content only; no consistent formatting |
| Language handling | Supported English, French, German, Arabic, and more | Supported several languages; display issues with Arabic | Supported English, French, German, Arabic, and more |
| Interactivity and speed | Fast and responsive | Quick interaction and response | Not applicable |
| File format handling | Supports PDF, PPT, Word, and more | Works in PPT; easy to manage and convert | Only supports PPT; easy to manage and convert |
| Sharing capabilities | Share via email, WhatsApp and more | Share via email | Share via email |
| Free usage | Unlimited use: over ten slides need an upgrade | Approximately Five free tries, then paid | Unlimited use via chat box |
| Data/stats handling | Converts data to visuals or pages | Uses search-based data to adjust content | Offers basic status and source links |
| Updates and development | Provides feedback and personalized updates | No feedback-based updates or AI integration | Limited updates; lacks AI integration |
| Security and privacy | Protected via registration and account controls | Secured via Microsoft Office registration | Secured via Microsoft Office registration |
| Aspects | ChatGPT | Autopilot | |
|---|---|---|---|
| Elements | Highlighted key points; improved research and critical thinking | Highlighted key points; enhanced research and thinking | Focused on the topic, improved cognitive skills |
| Sources and references | As shown by the search terms, the citation format is absent | Shown by search terms; no proper citation | Shown by search terms; no citation style used |
| Overall format | Used modern formats with visuals that matched the content | Used some modern formats suited to content | Content only; no consistent formatting |
| Language handling | Supported English, French, German, Arabic, and more | Supported several languages; display issues with Arabic | Supported English, French, German, Arabic, and more |
| Interactivity and speed | Fast and responsive | Quick interaction and response | Not applicable |
| File format handling | Supports PDF, PPT, Word, and more | Works in PPT; easy to manage and convert | Only supports PPT; easy to manage and convert |
| Sharing capabilities | Share via email, WhatsApp and more | Share via email | Share via email |
| Free usage | Unlimited use: over ten slides need an upgrade | Approximately Five free tries, then paid | Unlimited use via chat box |
| Data/stats handling | Converts data to visuals or pages | Uses search-based data to adjust content | Offers basic status and source links |
| Updates and development | Provides feedback and personalized updates | No feedback-based updates or | Limited updates; lacks |
| Security and privacy | Protected via registration and account controls | Secured via Microsoft Office registration | Secured via Microsoft Office registration |
Table 3 outlines the exclusive features of Gamma AI, including its extensive formatting support, multimedia support, language adaptability, data visualization and responsiveness to user input. It can also support maximum file compatibility and sophisticated sharing capabilities, which make it an extremely flexible tool for educational and presentation environments. However, its use without payment is limited by the number of available slides.
In comparison, ChatGPT is better at content creation, language support and interactivity. While it is less developed in visual presentation, citation management and embedding user feedback, it is an excellent tool for generating content-heavy material. It is also behind Gamma AI in terms of file format transformation and sharing functionality.
Autopilot, by contrast, is the simplest of the three. It prioritizes ease of use, with minimal formatting, no interactivity and no support for citation or updating. However, it provides unlimited, unrestricted use and handles multilingual input well, appealing to users who prefer a simple, no-frills interface.
8.2 Survey data analysis
The survey investigated students’ opinions on the AI tools they are using. Most students found these tools helpful. ChatGPT and Gamma were especially popular because they are simple to use and help improve their thinking skills. However, students did express some concerns. They mentioned that although these tools save time, they are sometimes inaccurate with citations and may not always encourage the development of original ideas. The biggest issue students faced was that ChatGPT and Autopilot could not be changed to suit their needs. This caused frustration because the tools could not follow specific academic citation guidelines required by their educational institutions.
8.2.1 AI tools, accuracy of results and their suitability for academic assignments.
The study focused on how well different AI tools performed. The tool Gamma stood out because it allowed users to choose from brief, medium and detailed options. This helped make the results more precise and accurate. On the other hand, ChatGPT and Autopilot lacked these options, so their results were less accurate and less customizable. Of the 92 students surveyed, 50 (approximately 54%) believed the results were accurate and suitable for academic work. However, 42 students (nearly 39%) disagreed with this assessment (see Figure 1). The study also highlighted that these AI tools have certain limitations in terms of accuracy, particularly when applied to Arabic. These tools work better in English because they are better optimized for that language.
The image features a pie chart titled ‘Numbers,’ which is divided into three segments represented by specific numerical values. The green segment shows a total of ninety-two, the blue segment represents fifty, and the orange segment indicates forty-two. To the right, a legend clarifies that the blue represents ‘Yes,’ the orange represents ‘No,’ and the green segment indicates the ‘Total,’ The layout is clear, with the title prominently displayed above the pie chart, allowing viewers to easily identify the categories associated with the chart’s values.AI tools, accuracy of results and their suitability for academic assignments
The image features a pie chart titled ‘Numbers,’ which is divided into three segments represented by specific numerical values. The green segment shows a total of ninety-two, the blue segment represents fifty, and the orange segment indicates forty-two. To the right, a legend clarifies that the blue represents ‘Yes,’ the orange represents ‘No,’ and the green segment indicates the ‘Total,’ The layout is clear, with the title prominently displayed above the pie chart, allowing viewers to easily identify the categories associated with the chart’s values.AI tools, accuracy of results and their suitability for academic assignments
8.2.2 AI tools in saving time and effort in preparing academic assignments.
The study focused on how AI tools can help university students save time and effort. It was discovered that tools like ChatGPT and Autopilot are particularly effective in reducing the time and effort students need to invest. However, accessing all their features often requires a paid subscription. Gamma, on the other hand, allows unlimited use, but only up to ten slides, after which there are extra charges. Most students found these tools helpful: 78 students (84.78%) agreed they improved their study experience, while 14 students (15.21%) did not find them helpful (see Figure 2). The main reason students liked these tools was that they made research easier and faster, helping them save time.
The image displays a bar chart summarising survey results with numbers and percentages. The chart has two distinct sections: one for raw figures and another for their corresponding percentages. The data points for "Yes" are 78, representing eighty-four point seventy-eight percent, while the "No" responses are 14, accounting for fifteen point twenty-one percent. The total number of responses documented is 92, with an overall percentage of ninety-nine point ninety-nine percent. The layout includes horizontal bars illustrating the numerical responses and a clear visual distinction between numbers andAI tools in saving time and effort in preparing academic assignments
The image displays a bar chart summarising survey results with numbers and percentages. The chart has two distinct sections: one for raw figures and another for their corresponding percentages. The data points for "Yes" are 78, representing eighty-four point seventy-eight percent, while the "No" responses are 14, accounting for fifteen point twenty-one percent. The total number of responses documented is 92, with an overall percentage of ninety-nine point ninety-nine percent. The layout includes horizontal bars illustrating the numerical responses and a clear visual distinction between numbers andAI tools in saving time and effort in preparing academic assignments
8.2.3 AI tools and facing challenges in understanding results or instructions.
During AI tools testing, students encountered a few difficulties. ChatGPT and Autopilot were slow to load, which made them confusing and difficult to use. Language issues, especially with Arabic, and troubles with the presentation format were also present. Autopilot offered simple explanations, but they did not go into much detail. On the other hand, Gamma was easier to use and more effective, although it was limited to a fixed number of slides. The study showed that, among the group, 57 students (approximately 62%) understood the results and instructions. However, 35 students (about 38%) faced challenges, primarily due to loading delays and difficulty following English instructions (see Figure 3).
The image shows a bar chart illustrating survey responses with values for “Yes” and “No.” Each response has corresponding numerical values: fifty-seven for “Yes” and thirty-five for “No.” The total number of responses is ninety-two. Beneath the bar chart, a table presents the same values alongside their corresponding percentages: sixty-one point ninety-five percent for “Yes” and thirty-eight point zero four percent for “No,” summing to a total percentage of ninety-nine point ninety-nine percent. The bars for “Yes” and “No” are distinctively colored: “Yes” is blue, “No”AI tools and facing challenges in understanding results or instructions
The image shows a bar chart illustrating survey responses with values for “Yes” and “No.” Each response has corresponding numerical values: fifty-seven for “Yes” and thirty-five for “No.” The total number of responses is ninety-two. Beneath the bar chart, a table presents the same values alongside their corresponding percentages: sixty-one point ninety-five percent for “Yes” and thirty-eight point zero four percent for “No,” summing to a total percentage of ninety-nine point ninety-nine percent. The bars for “Yes” and “No” are distinctively colored: “Yes” is blue, “No”AI tools and facing challenges in understanding results or instructions
8.2.4 AI tools support different languages.
The study revealed issues with Arabic language handling when using ChatGPT. It struggled with using words correctly and understanding them accurately. On the other hand, both Gamma and Autopilot performed better in Arabic, but they still did not provide adequate support with vocabulary building. Among the students involved, 47 (51.08%) found the language tools helpful, while 45 (48.91%) did not. This suggests a need to improve Arabic language support (see Figure 4).
The image depicts a triangular diagram representing survey results on a question, with ‘Yes’ and ‘No’ as the apex labels. The left side of the triangle displays the number forty-seven associated with ‘Yes’, while the right side shows forty-five linked to ‘No’. The central point appears as an orange dot, which may indicate an overall total of ninety-two responses. Numbers are represented in blue, while the percentage is highlighted in orange. The clear layout and distinct labeling help convey the survey's findings effectively, focusing on the comparison between the ‘Yes’ and ‘No’ responses.AI tools support different languages
The image depicts a triangular diagram representing survey results on a question, with ‘Yes’ and ‘No’ as the apex labels. The left side of the triangle displays the number forty-seven associated with ‘Yes’, while the right side shows forty-five linked to ‘No’. The central point appears as an orange dot, which may indicate an overall total of ninety-two responses. Numbers are represented in blue, while the percentage is highlighted in orange. The clear layout and distinct labeling help convey the survey's findings effectively, focusing on the comparison between the ‘Yes’ and ‘No’ responses.AI tools support different languages
8.2.5 AI tools and citation accuracy in academic integrity.
The study examined the extent to which AI tools meet academic writing standards, with a focus on citation styles such as APA and MLA. It showed that choosing the right keywords for citing sources can improve the results. Still, these tools often fail to meet all academic citation needs fully. Students had different opinions. Many pointed out that the tools did not help them use and integrate sources and references correctly. They often found that important details, such as up-to-date and complete publication information, were missing (see Figure 5).
The image is a bar graph that presents data over three categories: Yes, No, and Total. The vertical axis represents numerical values ranging from zero to two hundred, while the horizontal axis lists the categories. Each category has three bars: one for Increase depicted in dark blue, one for Decrease shown in orange, and one for Total indicated in green. The Yes category has a total value of forty-six for Increase and no value for Decrease; the No category shows forty-six for Increase and no value for Decrease; the Total category features a height of ninety-two for Increase and no value for Decrease. The maximum value appears prominently at the Total category.AI tools and citation accuracy in academic integrity
The image is a bar graph that presents data over three categories: Yes, No, and Total. The vertical axis represents numerical values ranging from zero to two hundred, while the horizontal axis lists the categories. Each category has three bars: one for Increase depicted in dark blue, one for Decrease shown in orange, and one for Total indicated in green. The Yes category has a total value of forty-six for Increase and no value for Decrease; the No category shows forty-six for Increase and no value for Decrease; the Total category features a height of ninety-two for Increase and no value for Decrease. The maximum value appears prominently at the Total category.AI tools and citation accuracy in academic integrity
8.2.6 AI tools and the addition of images and multimedia.
AI tools help enhance presentations by incorporating images and videos, making them more visually appealing and engaging. Students often use AI to ensure their presentations include attractive visuals. Tools like Gamma offer various templates for creating presentations and webpages. ChatGPT allows users to integrate images and templates into their work. However, Autopilot differs; it does not support images and primarily provides text-based assistance. According to the findings (see Figure 6), 66.30% of students appreciated AI tools for their multimedia features, while 33.69% did not (see Figure 6).
The image features a bar chart illustrating survey results. The chart presents three categories: ‘Yes’, ‘No,’ and ‘Total.’ The ‘Yes’ category shows sixty-one responses, represented by a blue bar. The ‘No’ category has thirty-one responses, indicated by an orange bar. The total number of responses is ninety-two, depicted by a green bar. Accompanying the bars are percentage values, with ‘Yes’ at sixty-six point three percent, ‘No’ at thirty-three point six percent, and the total at ninety-nine point nine percent. The chart uses distinct colours for each category: blue for ‘Yes,’ orange for ‘No,’ and green for ‘Total.’AI tools and the addition of images and multimedia
The image features a bar chart illustrating survey results. The chart presents three categories: ‘Yes’, ‘No,’ and ‘Total.’ The ‘Yes’ category shows sixty-one responses, represented by a blue bar. The ‘No’ category has thirty-one responses, indicated by an orange bar. The total number of responses is ninety-two, depicted by a green bar. Accompanying the bars are percentage values, with ‘Yes’ at sixty-six point three percent, ‘No’ at thirty-three point six percent, and the total at ninety-nine point nine percent. The chart uses distinct colours for each category: blue for ‘Yes,’ orange for ‘No,’ and green for ‘Total.’AI tools and the addition of images and multimedia
8.2.7 Evaluation of AI tools and their future use in higher education, specifically academic writing.
The study explored how students felt about using AI tools in their education. Many students reported being happy with these tools. Approximately 33.12% of students gave them a top score of 5, and 17.48% scored them a 4. In addition, 23% of students rated the tools as just average, while 11.04% rated them below average (see Figure 7).
The image depicts a combination bar and line graph showing rating data from one to five stars. The vertical axis on the left represents frequency, ranging from zero to forty, while the horizontal axis displays star ratings from one to five. Each of the five bars corresponds to a star rating, with the height of the bars illustrating the number of respondents for each rating. An orange line graph overlays the bars, indicating the cumulative percentage of responses, ranging from zero to one hundred percent, along the right vertical axis. The line steadily rises, illustrating the increasing proportion of total ratings as the star rating improves.Evaluation of AI tools in higher education, specifically academic writing
The image depicts a combination bar and line graph showing rating data from one to five stars. The vertical axis on the left represents frequency, ranging from zero to forty, while the horizontal axis displays star ratings from one to five. Each of the five bars corresponds to a star rating, with the height of the bars illustrating the number of respondents for each rating. An orange line graph overlays the bars, indicating the cumulative percentage of responses, ranging from zero to one hundred percent, along the right vertical axis. The line steadily rises, illustrating the increasing proportion of total ratings as the star rating improves.Evaluation of AI tools in higher education, specifically academic writing
8.2.8 AI tools and future use in higher education generally and academic writing specifically.
These results suggest that AI tools could become increasingly important in education, but only if clear guidelines are established to guide their use. Instructors also need to focus on helping students enhance their academic abilities. Moreover, 79.34% of students want to continue using AI tools, while 20.65% do not (see Figure 8).
This image features a bar graph illustrating survey results with two primary categories: "Yes" and "No." The x-axis represents the number of responses ranging from zero to eighty, while the y-axis denotes the percentage of responses, labeled from zero to one hundred percent. The blue bar signifies the "Yes" category with a total of seventy-three responses, corresponding to seventy-nine point thirty-four percent. The orange bar represents the "No" category with nineteen responses, which is twenty point sixty-five percent. Below the graph, a table outlines the numbers and percentages for both categories, with the "No" category listed first, followed by the "Yes" category. The table indicates that the "No" responses amount to nineteen, while the "Yes" responses total seventy-three.AI tools and future use in higher education generally and academic writing specifically
This image features a bar graph illustrating survey results with two primary categories: "Yes" and "No." The x-axis represents the number of responses ranging from zero to eighty, while the y-axis denotes the percentage of responses, labeled from zero to one hundred percent. The blue bar signifies the "Yes" category with a total of seventy-three responses, corresponding to seventy-nine point thirty-four percent. The orange bar represents the "No" category with nineteen responses, which is twenty point sixty-five percent. Below the graph, a table outlines the numbers and percentages for both categories, with the "No" category listed first, followed by the "Yes" category. The table indicates that the "No" responses amount to nineteen, while the "Yes" responses total seventy-three.AI tools and future use in higher education generally and academic writing specifically
8.2.9 Survey qualitative analysis.
Ninety-two student comments on AI tools for academic writing revealed both positive and negative comments. Most students appreciated AI’s time-saving benefit and its ability to improve grammar and spelling, with one student commenting, “AI helps save time and speed up tasks.” However, there were reservations, particularly regarding the accuracy of Arabic. One student commented, “AI is a double-edged sword,” highlighting both its usefulness and risk.
Students also proposed enhancements, including better alignment with academic standards, improved language handling and enhanced citation support. “AI ought to give academic writing standards,” one student said. Though students appreciate the utility of AI, they insisted that it should serve to augment, rather than substitute for independent academic ability. As one student noted, “It can be developed and controlled by the university,” while another emphasized the need for controlled, ethical use of AI tools in academic settings.
To evaluate the impact of AI tools on student performance, a paired-sample t-test was conducted by comparing the pretest with the posttest in the three-rubric metrics: Ease of Use, Accuracy and Citation Adherence. In addition, Cohen’s d was calculated to measure the magnitude of the changes. The following table indicates the statistical significance of the observed differences as well as the practical significance.
The results of Table 4 indicate a higher performance level on each metric. This means that the Paired-sample tests demonstrate the actual benefits of using AI tools. The most evident improvement is related to the ease of use; hence, a large effect size (Cohen’s d = 0.82). Consequently, students remarkably experienced easier completion of the assignment. However, accuracy and citation adherence improved, although not to the same extent as ease of use. The paired-sample analysis clearly demonstrates that AI tools can enhance student performance, particularly by simplifying tasks and facilitating student fluency in their work. However, these findings also highlight the importance of integrating AI in a way that not only simplifies the process but also fosters the development of essential academic skills.
Paired-samples t-test results for performance metrics (pretest vs posttest, N = 92)
| Metric | Pretest mean (SD) | Posttest mean (SD) | t(91) | p | Cohen’s d |
|---|---|---|---|---|---|
| Ease of use | 1.16 (0.71) | 1.38 (0.79) | 7.82 | < 0.001 | 0.82 |
| Accuracy | 0.97 (0.76) | 1.33 (0.82) | 2.72 | 0.008 | 0.28 |
| Citation adherence | 1.05 (0.79) | 1.28 (0.68) | 2.31 | 0.023 | 0.24 |
| Metric | Pretest mean ( | Posttest mean ( | t(91) | p | Cohen’s d |
|---|---|---|---|---|---|
| Ease of use | 1.16 (0.71) | 1.38 (0.79) | 7.82 | < 0.001 | 0.82 |
| Accuracy | 0.97 (0.76) | 1.33 (0.82) | 2.72 | 0.008 | 0.28 |
| Citation adherence | 1.05 (0.79) | 1.28 (0.68) | 2.31 | 0.023 | 0.24 |
To study the statistical correlation between the pre- and postscores for each metric, Pearson correlation coefficients were calculated. This calculation helps indicate whether high performers’ prescore rankings remained unchanged postintervention or if the rank order shifted. In addition, the statistical correlation examines the direction and significance level of the relationship among the variables. The results indicate a strong correlation between ease of use and high performance, meaning that students who ranked high before the intervention maintained their high rankings after the intervention involving AI use. By contrast, accuracy indicates a negative correlation, which means that students with high-ranking experience a negative impact in terms of their accuracy score post the intervention. At the same time, students who experienced low accuracy pre-AI use gained higher accuracy post-AI use. Finally, in terms of Citation Adherence, there is a low correlation, indicating that the AI had a limited influence on citation adherence performance. However, it did improve the citation skills of a certain number of students to some extent.
Moreover, it is clear how crucial it is to provide tailored instructional support. While AI has the potential to make academic resources more accessible to everyone, its impact is not the same for all. It can only be truly beneficial when it is thoughtfully woven into the learning experience. These insights echo the calls from experts for a deeper understanding of AI and for designing AI-enhanced teaching methods that meet the diverse needs of all learners (see Table 5).
Pearson correlations between pre- and posttest scores for each metric (N = 92)
| Metric | r (pre, post) | Significance |
|---|---|---|
| Ease of use | 0.943 | p < 0.001 |
| Accuracy | −0.27 | p < 0.01 |
| Citation adherence | 0.165 | ns (p = 0.12) |
| Metric | r (pre, post) | Significance |
|---|---|---|
| Ease of use | 0.943 | p < 0.001 |
| Accuracy | −0.27 | p < 0.01 |
| Citation adherence | 0.165 | ns (p = 0.12) |
The above Table 6 evaluated the difference in the intervention’s effectiveness for each AI tool, studied tool, along with one-way ANOVA results. The post hoc Tukey tests indicate that ChatGPT was the easiest to use (0.6 < 0.1), followed by Autopilot and lastly, Gamma. In terms of the mean, ChatGPT was rated as the easiest to use (mean of approximately = 1.06 out of 2), while the mean of Autopilot was approximately 1.4 out of 2, and Gamma was 1.0. The difference is most evident in terms of ease of use. For accuracy and citation adherence, the differences are insignificant; however, ChatGPT scored highest, followed by Autopilot and Gamma. These findings underscore the importance of interface design and user experience in enhancing the effectiveness of AI tools in education. While all three tools support academic tasks, ChatGPT stands out for ease of use. This suggests that when planning for future AI adoption, we should focus on tools that resonate with how students naturally interact with and process information.
Posttest performance by AI tool used, with ANOVA comparisons (N = 92)
| Metric | ChatGPT (n ≈ 30) | Gamma (n ≈ 31) | Autopilot (n ≈ 31) | ANOVAF(2,89) | p | Partial η2 |
|---|---|---|---|---|---|---|
| Ease of use | 1.6 (0.6) | 1.0 (0.7) | 1.4 (0.8) | 4.67 | 0.012 | 0.1 |
| Accuracy | 1.5 (0.7) | 1.2 (0.8) | 1.3 (0.9) | 5.23 | 0.007 | 0.11 |
| Citation adherence | 1.3 (0.7) | 1.2 (0.6) | 1.3 (0.7) | 1.2 | 0.307 | 0.03 |
| Metric | ChatGPT (n ≈ 30) | Gamma (n ≈ 31) | Autopilot (n ≈ 31) | p | Partial η2 | |
|---|---|---|---|---|---|---|
| Ease of use | 1.6 (0.6) | 1.0 (0.7) | 1.4 (0.8) | 4.67 | 0.012 | 0.1 |
| Accuracy | 1.5 (0.7) | 1.2 (0.8) | 1.3 (0.9) | 5.23 | 0.007 | 0.11 |
| Citation adherence | 1.3 (0.7) | 1.2 (0.6) | 1.3 (0.7) | 1.2 | 0.307 | 0.03 |
9. Results and discussion
The study’s conclusions offer a thorough examination of how university students use AI tools, highlighting several important trends and insights (Table 7). First, the use of AI tools varied by age. Younger students are more likely to use these technologies, consistent with broader trends among university students. This suggests that younger age groups are generally more familiar with digital tools, likely due to their earlier exposure to technology.
Summary of key findings
| Aspect | ChatGPT | Gamma | Autopilot |
|---|---|---|---|
| Ease of use | Highest rated; intuitive and user-friendly | More complex; lowest ease of use | Moderate; simple but limited in scope |
| Accuracy | Mixed results; sometimes less precise | Strongest in accuracy; multiple output options | Moderate; less precise than Gamma |
| Citation adherence | Weak; frequent APA-style errors | Limited; incomplete citations | Weak; minimal adherence |
| Arabic language handling | Struggled significantly with Arabic | Better performance with Arabic | Moderate handling of Arabic |
| Time/effort savings | Significant time savings reported | Effective but restricted by the free version limits | Significant time savings |
| Multimedia integration | Supports images/templates | Strong multimedia features | Very limited (primarily text-based) |
| Overall student view | Easy to use but unreliable for citations | Accurate and effective but less flexible to use | Simple and useful but less flexible |
| Aspect | ChatGPT | Gamma | Autopilot |
|---|---|---|---|
| Ease of use | Highest rated; intuitive and user-friendly | More complex; lowest ease of use | Moderate; simple but limited in scope |
| Accuracy | Mixed results; sometimes less precise | Strongest in accuracy; multiple output options | Moderate; less precise than Gamma |
| Citation adherence | Weak; frequent APA-style errors | Limited; incomplete citations | Weak; minimal adherence |
| Arabic language handling | Struggled significantly with Arabic | Better performance with Arabic | Moderate handling of Arabic |
| Time/effort savings | Significant time savings reported | Effective but restricted by the free version limits | Significant time savings |
| Multimedia integration | Supports images/templates | Strong multimedia features | Very limited (primarily text-based) |
| Overall student view | Easy to use but unreliable for citations | Accurate and effective but less flexible to use | Simple and useful but less flexible |
The study also revealed the extent of students’ familiarity with AI tools. The fact that more than half of the participants (52.17%) reported prior experience using AI tools suggests that the rapid adoption of technology has fostered increasing familiarity. However, a sizable percentage (47.82%) indicated that they had never used AI tools. This could be due to several factors, such as a lack of knowledge, the abundance of available applications or subscription fees for premium versions.
In addition to these quantitative findings, student reflections and focus group discussions provided valuable insights into how students viewed and used each AI tool. These qualitative perspectives complemented and clarified the statistical results in this section.
Alongside the quantitative findings, qualitative reflections and group discussions offered a deeper understanding of how students experience each AI tool. These insights helped explain the patterns observed in the data. For example, students’ comments about ChatGPT being simple and Gamma flexibility reflected the observed differences in ease of use and accuracy scores. Additionally, comments on citation adherence and Arabic-language performance provided context for the statistical results, presenting a more comprehensive view of the students’ experiences.
For instance, several students described ChatGPT as straightforward and easy to navigate, which supported the higher ease-of-use scores. Others noted that Gamma’s detailed output options made it more accurate but more challenging to master, aligning with the variation observed in the accuracy data. These examples show how qualitative themes supported the quantitative patterns.
The Gamma application was notable for its ability to enhance the accuracy of research outputs when evaluating the efficacy of specific AI tools. Gamma gave students the flexibility they needed to guarantee accuracy in their work by providing a range of output options, including brief, medium and detailed formats. However, while both ChatGPT and Autopilot were praised for producing relevant results, their accuracy and personalization features were limited, which may have restricted their effectiveness for some students. This interpretation was further reinforced by students’ qualitative feedback, where several noted that ChatGPT occasionally generated vague results and that Autopilot lacked customization. Such comments helped explain the quantitative variance observed across tools.
Regarding the overall accuracy of AI tools, 54.34% of students indicated that the results were appropriate for academic writing, particularly in terms of the content presentation and idea organization. This finding suggests that, although AI tools are helpful, their ability to meet the rigorous demands of academic writing still requires improvement.
The study also showed that people thought AI tools helped save time and effort and improve efficiency. These advantages were recognized by 84.78% of students, suggesting that AI tools have significantly enhanced productivity. Although most students (61.95%) found these tools easy to use, many still encountered difficulties when attempting to use them to their full potential.
Qualitative responses confirmed this point, with many students reporting that AI tools “saved time in drafting” but required “extra effort to refine outputs” to meet academic standards.
The difficulties AI tools encounter when processing Arabic were another important topic covered in the study. There was a notable linguistic gap, as nearly 49% of students (48.91%) expressed dissatisfaction with the tools’ capacity to support Arabic. Furthermore, it was found that AI tools often struggled to follow scholarly writing and citation guidelines. This was mainly due to inaccurate keyword use and weak source relevance, which reduced the tools’ effectiveness for assignments requiring precise citations.
On a positive note, 66.30% of students acknowledged the benefits of using images and graphics in academic work, indicating that AI tools can successfully integrate multimedia elements. This feature was considered particularly helpful for presentation-based tasks.
Qualitative reflections further confirmed this finding, as students frequently noted grammatical and semantic errors in their Arabic output. Although the tools were functional and time-saving, their linguistic accuracy remained limited.
Looking ahead, students recognized how AI tools could support the definition of research topics and assist in conducting early-stage studies. This suggests that, by helping students focus their research efforts and locate trustworthy sources, AI tools may become increasingly valuable during the initial phases of academic work.
Overall, integrating quantitative and qualitative findings provided a comprehensive understanding of students’ experiences. This approach allowed the results to capture not only measurable outcomes but also the underlying perceptions that shape them. The joint interpretation of both forms of evidence serves as the foundation for the following discussion, which links the study’s findings to relevant theoretical frameworks and current research in educational technology.
10. Limitations of the study
Because this study is exploratory and conducted at one university, it cannot be generalized to the larger population. Participants were not randomly assigned to conditions, and the absence of a control group makes it difficult to isolate the specific effects of AI tools. Future research could expand on these findings by including a more representative sample and a control group to better inform educators and researchers about the impact of AI tool use.
11. Conclusion
The present study examined the impact of three AI applications – ChatGPT, Gamma and Autopilot – on the performance of university students in Arabic-taught courses. Through quantitative and qualitative analyses, the study identified the strengths and weaknesses of these tools in aiding academic writing and assignments.
The results revealed that AI tools significantly improved simplicity and time efficiency. ChatGPT was the most convenient. Accuracy was Gamma’s primary advantage, particularly because it offered customizable output formats, whereas Autopilot was valued for its simplicity but remained limited in scope. Nevertheless, all the tools encountered difficulties with adherence to citation standards and Arabic language accuracy, with ChatGPT presenting the most significant challenges in these areas.
The findings align with the TAM and CLT, suggesting that usability and efficiency remain key factors in influencing students’ adoption and perception of AI tools.
The above findings indicate that although AI tools enhance efficiency and accessibility, they still cannot guarantee academic reliability. These tools are intended to supplement rather than replace academic skills, particularly critical thinking, referencing and writing in non-English languages.
The combination of qualitative and quantitative results demonstrated that AI tools improve usability and efficiency but still struggle with accurate citation and Arabic handling. The students’ reflections revealed that convenience was often prioritized over accuracy. AI must not be used as a substitute for academic skills. Educators are expected to offer guidelines on critical AI use, developers are expected to enhance language and citation capabilities, and students are expected to evaluate outputs reflectively.
Overall, the findings emphasize that AI enhances productivity but does not yet ensure complete academic reliability, particularly in multilingual contexts.
12. Practical implications
The paper identifies meaningful implications, although these are limited by the relatively small sample size. Despite this constraint, the study contributes to the growing body of research on AI in higher education by addressing key aspects such as usability, accuracy, citation integrity and language support. It partially bridges the gap between theory and practice by providing insights that inform both pedagogical and technological development.
For practice, the findings encourage educators to foster critical and reflective engagement with AI tools, developers to enhance linguistic and ethical standards and students to evaluate AI outputs independently and critically. While the broader economic and societal impacts are limited, the study advances discussions on responsible AI integration, digital literacy and academic integrity, aligning consistently with its conclusions.
Ethics statement
The participants volunteered to take part in the study, and informed consent was obtained. Anonymity was maintained throughout the research process. Students were informed about the study’s purpose and assured of confidentiality.
Summary of contributions: This research has three main contributions. First, it presents the findings of empirical research on the impact of AI tools on students’ performance in Arabic-based higher education, addressing a critical gap in the existing research, which is predominantly English-focused research. Second, it analyzes three popular AI tools – ChatGPT, Gamma and Autopilot – and highlights their trade-offs in terms of usability, accuracy, citation compliance and language support. Third, it combines quantitative and qualitative data to explain the comprehensive picture of the perception and use of AI tools in the real classroom. Collectively, these contributions advance research on the opportunities and challenges of AI in education, offering practical guidance to educators, students and developers seeking to implement AI responsibly in academic environments.

