Research Article| April 02 2026

Instructor-guided AI agents for Newtonian misconceptions in moodle a pilot study

Rabih Elias Kahaleh

0009-0000-2418-2222

Rabih Elias Kahaleh

Department of Computer Science, Faculty of Arts and Science,

University of Balamand

, Tripoli,

Lebanon

Department of Experimental Sciences,

Autonomous University of Barcelona

, Bellaterra,

Spain

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Rabih Elias Kahaleh can be contacted at: rabih.kahaleh@balamand.edu.lb

Publisher: Emerald Publishing

Received: August 22 2025

Revision Received: November 19 2025

Accepted: January 15 2026

Online ISSN: 3049-5474

2026

Rabih Elias Kahaleh

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Artificial Intelligence in Education (2026) 2 (1): 90–108.

https://doi.org/10.1108/AIIE-08-2025-0213

Purpose

This pilot study explores the feasibility and user acceptance of a persistent AI agent architecture embedded in Moodle that generates misconception specific feedback and is moderated by teachers.

Design/methodology/approach

A four phase pilot with 100 Lebanese Grade 12 students and six physics instructors used misconception tagged Force Concept Inventory items; an OpenAI Agent Mode plugin created personalized HTML pages automatically, which teachers reviewed before release.

Findings

The system produced 224 pages in an average of 1.8 minutes with 99.2 percent uptime, and 83 percent were approved unchanged or after minor edits. Student surveys (N = 93) showed high acceptance across all Technology‑Acceptance‑Model constructs (means = 3.9–4.3 / 5, Cohen’s d > 1), and teachers reported a 60–73 percent reduction in feedback time compared with manual preparation. Qualitative feedback highlighted diagnostic precision and the value of multimedia, while moderation logs identified curriculum alignment and scientific accuracy as priority refinement areas.

Research limitations/implications

This exploratory pilot focused on feasibility and user acceptance rather than learning outcomes. Future controlled studies should measure conceptual change and long-term retention to validate pedagogical effectiveness.

Practical implications

The framework offers institutions a replicable model for scaling personalized feedback while maintaining human oversight, with documented 60–73% time savings for instructors and high student acceptance rates.

Originality/value

As the first empirical demonstration of a persistent multitool AI agent inside an LMS delivering instructor verified feedback at scale, the framework is readily transferable to other STEM contexts.

1. Introduction

Artificial intelligence (AI) is increasingly recognized as a transformative force in educational technology, particularly in addressing persistent challenges in science education (Ali et al., 2025; Marín et al., 2025). The widespread adoption and rising public interest in AI, especially since the emergence of large-scale generative models in late 2022, have positioned it as a flexible, learning-driven technology capable of revolutionizing various educational domains, including teaching preparation, assessment design, grading, and student learning (Wang et al., 2024a, b; U.S. Department of Education, 2025; Kabilan et al., 2025). Educators, policymakers, and students are generally optimistic about AI's potential to deliver personalized education (Ayeni et al., 2024). In physics education, student misconceptions represent one of the most enduring pedagogical challenges (Halloun and Hestenes, 1985; Hestenes et al., 1992; Polverini et al., 2025; Resbiantoro and Setiani, 2022). Research consistently demonstrates that traditional instruction often fails to dislodge deeply held alternative conceptions about fundamental topics. These misconceptions, especially in Newtonian mechanics, can persist even after formal instruction and significantly impede student understanding of more advanced physics concepts. For example, a common misconception is the belief that a constant force is required to keep an object in motion, which directly contradicts Newton's first law (Brown and Clement, 1987). The Force Concept Inventory (FCI) is a widely used diagnostic tool that specifically targets these deep-seated misconceptions in Newtonian mechanics. Without personalized feedback, students may not realize their mistakes or how to improve and providing such feedback at scale in large-enrolment courses is logistically and intellectually challenging for instructors. Existing physics-specific digital tools often serve as mathematical solution verifiers rather than providing qualitative conceptual reasoning feedback. This pilot study explores two main research questions to investigate the feasibility and perceived effectiveness of AI-powered agent-based feedback in physics education:

RQ1.

How feasible is it to implement a persistent AI agent system within a standard LMS for generating misconception-specific feedback?

RQ1.1.

Does the AI agent system operate reliably and efficiently in generating feedback pages based on student quiz submissions?

RQ1.2.

Can the AI-generated feedback be moderated effectively by instructors to ensure pedagogical alignment and content accuracy?

RQ2.

How do students and instructors perceive the usefulness and clarity of AI-generated, misconception-specific feedback?

RQ2.1.

Do students find the AI-generated feedback clear, engaging, and helpful in understanding their misconceptions?

1.1 Related work

1.1.1 AI applications in physics education and misconception remediation

In physics education, student misconceptions represent one of the most persistent pedagogical challenges, with research showing that traditional instruction often fails to dislodge deeply held alternative conceptions (Halloun and Hestenes, 1985). These misconceptions, particularly in Newtonian mechanics, can block students from advancing to more complex concepts ideas. Consequently, recent work has begun to investigate whether large language models (LLMs) can serve as more responsive tools for helping learners confront and revise such misconceptions. Before the emergence of machine-learning methods, instructional systems relied on narrower techniques such as error detection and rule-based hint generation, yet these approaches lacked the capacity to reveal students' underlying conceptual models.

Against this backdrop, a growing body of research has examined how effectively LLMs reason about core physics concepts. Earlier research has documented persistent limitations in LLM reasoning on foundational physics concepts, noting issues such as conceptual superficiality and difficulty interpreting visual or diagram-based representations (Aldazharova et al., 2024; Polverini et al., 2025). Building on these findings, recent controlled evaluations show that even when LLMs generate correct numerical responses, they often struggle to articulate underlying assumptions or support genuine conceptual change, largely due to insufficient epistemic grounding (Dunlap et al., 2025). Emerging classroom-based studies using chat interfaces appear more promising, demonstrating the potential for misconception-specific dialogue and more responsive forms of interaction (Kahaleh and Lopez, 2025; Polverini et al., 2025; Weijers et al., 2025). Nevertheless, despite this promising development, the more recent findings continue to highlight gaps in contextual alignment, depth of explanation, and the integration of visual or interactive representations, all of which are critical for addressing alternative conceptions in physics. Together, this trajectory of research underscores the continuing need to refine automated feedback systems and to clarify the specific points at which current LLM-based approaches remain insufficient.

These discipline-specific concerns mirror broader trends observed across other domains of learning. In mathematics, biology, and the social sciences, AI-driven feedback systems have increasingly been used to surface misconceptions by diagnosing inaccurate prior knowledge, generating targeted prompts, and scaffolding revision of faulty mental models (Ali et al., 2025; Moghaddam et al., 2024). However, meta-analytic reviews consistently indicate that such systems are most effective when embedded within structured pedagogical frameworks, supported by teacher mediation, or paired with explicit conceptual scaffolding rather than presented as generic text corrections (Chen and Wan, 2025). This broader evidence further reinforces the necessity of domain-specific design principles and aligns with our decision to incorporate misconception tagging and instructor moderation into our framework.

Beyond system design and instructional structure, recent reviews emphasize that AI literacy is becoming a prerequisite for the productive use of AI-generated feedback. Without explicit guidance, students tend to treat AI responses as authoritative rather than as provisional claims to be interrogated (Sabatini et al., 2023). In STEM contexts specifically, structured tasks and teacher mediation have been shown to be critical in helping learners' question, justify, and revise their initial conceptions when interacting with generative AI (He et al., 2025). This evidence provides further justification for our instructor-moderation layer: the human-in-the-loop functions not merely as a quality-assurance mechanism but also as a pedagogical scaffold that models how scientific claims are evaluated.

1.1.2 Automated feedback systems: rule-based ITS → MLM → LLM

Research on automated feedback spans decades. Rule-based intelligent tutoring systems (ITS) demonstrated substantial gains (Kulik and Fletcher, 2016). Building on this early foundation, MLM approaches improved automation—e.g. error-classification models in programming (Keuning et al., 2018) and automated writing feedback using NLP (Shi and Aryadoust, 2024). While MLM surpassed purely rule-based checkers by recognizing error types statistically, they typically targeted surface-level correctness rather than deep misconceptions; in physics, many models were constrained to numerical accuracy or answer classification rather than students' causal reasoning.

More recently, this line of work has shifted toward systems capable of engaging with conceptual understanding rather than surface features. Recent studies shows that LLMs extend the paradigm to conceptual domains. Guo et al. (2024) demonstrate a multi-agent “AutoFeedback” pipeline that curbs common LLM flaws (e.g. over-praise, spurious inference) and yields more accurate, pedagogically sound science feedback. Likewise, GPT-4 can grade and explain high-school physics work at near–human levels, with most messages acceptable to veteran instructors (Chen and Wan, 2025). Where MLM pipelines were confined to classification and pattern-based hints, agentic LLMs integrate reasoning, tool use, and multimedia retrieval, enabling adaptive, multimodal feedback at scale. These developments shift a key practical question: what distinct advantages do agentic LLMs offer over MLM pipelines inside real classrooms?

1.2 Agentic AI affordances and human-in-the-loop governance

Agentic methods such as Toolformer and ReAct formalize how LLMs interleave reasoning with actions (API calls, search, computation), enabling planning, tool use, memory, and self-critique (Schick et al., 2023; Yao et al., 2023; survey: Wang et al., 2024a, b). Extending these capabilities into domain-specific contexts, physics education studies show that dialogic agents—even imperfect ones—can stimulate conceptual gains (Weijers et al., 2025). However, these benefits are consistently tempered by concerns about trust, alignment, and pedagogical appropriateness. Experiences with Jill Watson and AIED reviews emphasize human-in-the-loop moderation to ensure accuracy, tone, and curricular fit (Goel and Polepeddi, 2018; Polverini et al., 2025; Luckin et al., 2016). In line with this evidence, our study follows this guidance by routing all agent-generated pages through an instructor moderation dashboard before release, aligning with conceptual change principles (Posner et al., 1982) and reducing the risk of unverified claims. Yet despite these safeguards, a key implementation challenge persists embedding these agentic capabilities into everyday LMS workflows rather than confining them to isolated pilots.

1.3 LMS-embedded, misconception-triggered agents (gap and contribution)

Although AI features are appearing in commercial LMSs (e.g. content curation, personalization), misconception-driven, agentic feedback embedded directly in assessment workflows remains rare. Prior LMS chatbots largely offered generic or administrative support rather than item-level, misconception-specific remediation. Our contribution is a persistent, tool-using agent in Moodle that is triggered by misconception tags and governed by instructor moderation, combining automation speed with expert pedagogical control. In parallel, the Technology Acceptance Model (TAM) has strong support in LMS/EdTech meta-reviews (King and He, 2006; Šumak et al., 2011; Murillo et al., 2021), and recent work extends TAM to AI-specific factors such as trust, transparency, and risk (Ali et al., 2025; Barakat et al., 2025; Mustofa et al., 2025). Complementing this adoption lens, conceptual change theory (Posner et al., 1982; Strike and Posner, 1982) stresses that misconceptions are displaced only when new explanations are intelligible, plausible, and fruitful. This dual lens of acceptance and pedagogy sets up our theoretical framework for evaluating an LMS-embedded, misconception-triggered, instructor-moderated agent.

2. Theoretical framework

2.1 Technology acceptance in educational contexts

Because the feedback system introduces a novel AI layer into a familiar LMS workflow, student acceptance is a prerequisite for meaningful engagement. The Technology Acceptance Model (TAM) provides a widely validated foundation for understanding how learners form attitudes toward educational technologies. Meta-analytic findings consistently show that perceived usefulness (PU) and perceived ease of use (PEOU) are the strongest predictors of intention to use and sustained engagement (Fearnley and Amora, 2020; King and He, 2006; Šumak et al., 2011). Research on LMS adoption reinforces this pattern: the clarity of information, quality of system design, and perceived responsiveness strongly shape users' sense of usefulness and their willingness to engage with functionalities such as feedback tools, analytics panels, and interactive modules (Fathema et al., 2015; Fearnley and Amora, 2020). In AI-supported learning environments, perceptions of trust, transparency, and risk also influence engagement decisions. Studies show that learners' acceptance of generative systems depends not only on core TAM constructs but also on whether AI-generated explanations appear credible, accurate, and contextually aligned with course content (Barakat et al., 2025; Mustofa et al., 2025). In this study, PU reflects the clarity, relevance, and helpfulness of misconception-specific feedback, while PEOU reflects the seamlessness of accessing and understanding AI-generated explanations within Moodle. TAM therefore provides the theoretical foundation for understanding how students evaluate the instructional value and usability of an AI-driven feedback workflow.

2.2 Conceptual change and feedback design

While TAM explains whether learners will use the system, conceptual change theory explains how they learn from it. Newtonian mechanics is known to involve deeply held intuitive conceptions about motion and force that often persist despite formal instruction (Halloun and Hestenes, 1985; Kortemeyer, 2023). The classic conceptual change model posits that learners revise prior conceptions only when new ideas are intelligible, plausible, and fruitful, and when they become dissatisfied with the explanatory power of their existing mental models (Posner et al., 1982; Strike and Posner, 1982). Recent work in physics education underscores the importance of explicitly naming the misconception, contrasting it with scientific reasoning, and supporting restructuring through multimodal representations. For instance, analyses of student reasoning show that learners often rely on implicit heuristics that must be surfaced and challenged for conceptual change to occur (Kortemeyer, 2023). Studies on representation use indicate that sustained conceptual development requires opportunities to coordinate diagrams, simulations, and verbal explanations rather than relying on a single modality. Feedback research supports a similar position: effective feedback provides information about why an answer is incorrect and what conceptual shift is needed (Hattie and Timperley, 2007). The ICAP framework further argues that constructive and interactive engagement—e.g. reflection prompts, simulation manipulation—produces deeper conceptual reorganization than passive reading (Chi and Wylie, 2014). In the context of physics learning, combining misconception tagging, contrastive Newtonian explanations, interactive simulations, and reflective prompts provides the conditions needed for students to confront their existing beliefs and reorganize their underlying causal models of motion and force.

2.3 Linking agentic affordances to acceptance and learning

Advances in agentic large language models have significantly expanded the possibilities for automated feedback in education. Unlike earlier machine-learning approaches that focused on surface-level error detection (Keuning et al., 2018; Shi and Aryadoust, 2024), agentic LLMs can interleave reasoning with external actions—such as querying tools, retrieving resources, and revising outputs—thereby generating richer and more adaptive explanations. Techniques such as Toolformer and ReAct formalize how models synthesize planning, tool use, and self-correction to produce coherent, context-aware outputs (Schick et al., 2023; Yao et al., 2023). Recent syntheses further highlight that agentic architectures incorporate elements of memory, iterative refinement, and multimodal retrieval, enabling more authoritative and context-aligned instructional messages (Wang et al., 2024a, b). These affordances are particularly relevant for physics education, where learners often benefit from explanations that integrate text, diagrams, and interactive simulations. Agentic LLMs are capable of assembling such multimodal content dynamically, allowing feedback to be tailored to the misconception triggered by each student response and delivered in real time. However, empirical work in physics education cautions that even advanced LLMs may misinterpret diagrams or apply Newtonian principles inconsistently (Kortemeyer, 2023), and dialogic studies show that students benefit from AI-mediated interactions only when they actively interrogate the explanations provided (Weijers et al., 2025). These findings underscore the need for structured oversight when deploying agentic systems in conceptually demanding scientific domains.

Within broader AIED discourse, recent reviews emphasize that the pedagogical value of agentic systems depends not only on their generative capabilities but also on their alignment with instructional goals and learners' epistemic resources (Holmes et al., 2022; Ali et al., 2025). High-quality feedback, precise, conceptually clear, and supported by relevant representations that have been shown to strengthen learners' perceived usefulness of educational technologies. Likewise, seamless LMS-embedded workflows improve perceived ease of use by reducing cognitive overhead and reinforcing continuity with existing learning practices. Together, the affordances of agentic LLMs and the design of LMS-native workflows form a bridge between pedagogical mechanisms of conceptual change and learner acceptance constructs. Agentic models enable the generation of targeted, multimodal feedback, while their integration within a familiar LMS context supports students' evaluative judgements about the system's usefulness, usability, and trustworthiness.

2.4 Human-in-the-loop moderation as an epistemic and pedagogical safeguard

Across AI-in-education literature, instructor oversight is identified as a central component of reliable feedback systems. Work on hybrid AI teaching assistants (e.g. Jill Watson) demonstrates that teacher moderation is essential for maintaining accuracy, filtering inappropriate responses, and ensuring alignment with curricular objectives (Goel and Polepeddi, 2018). Broader AIED syntheses similarly argue that human–AI partnerships produce more trustworthy and instructionally coherent guidance than fully automated systems (Luckin et al., 2016; Polverini et al., 2025). Research on AI literacy adds a complementary perspective: students often struggle to evaluate the correctness of AI explanations and may place undue trust in outputs that appear authoritative or fluent (Chee et al., 2025). In scientific contexts, this tendency can be particularly problematic because students may unintentionally reinforce misconceptions when AI-generated material is not epistemically vetted. Accordingly, instructor moderation in the present system functions as both a quality-control mechanism and a pedagogical scaffold. Teachers review each AI-generated feedback page, ensuring accuracy, conceptual alignment, and appropriate tone before students receive it. The moderation step therefore models disciplinary evaluation practices, strengthens epistemic trust, and mitigates the risks associated with automated conceptual explanations.

3. Methodology

3.1 Context and participants

This exploratory study was conducted over five months (January–May 2025) at two private co-educational secondary schools in northern Lebanon. The project progressed from plugin development and sandbox testing, through instructor piloting and refinement, to full student deployment in late spring. Schools were chosen for their established Moodle infrastructure, alignment with the physics curriculum, and willingness to engage in AI-focused educational innovation. A total of 100 Grade 12 general-science students (aged 16–17; 50 per school) took part. All were familiar with Moodle, yet none had previously used AI-generated feedback tools. Their average grade in the preceding physics term was 78.4% (SD = 6.3). Six physics instructors (three per school, 8–18 years' experience) contributed to quiz design, misconception-tag validation, and moderation of AI-generated feedback. Informed consent was obtained from all participants in accordance with institutional policies.

3.2 Pilot study design rationale

This exploratory study employed a single-group, post-intervention design appropriate for testing technical feasibility and initial user acceptance. Rather than measuring learning outcomes—which would require controlled experimental conditions—this pilot focused exclusively on demonstrating system functionality, generating student feedback, and identifying implementation challenges—deliberately excluding learning outcome assessments to prioritize system validation and classroom feasibility. Alternative methodological options were considered, yet we deliberately prioritized TAM-based acceptance measures because adoption is a necessary precondition for effectiveness. If students reject the system, or if instructors perceive it as burdensome or untrustworthy, then no amount of pedagogical sophistication can lead to consistent exposure or deep engagement, making conceptual change impossible to measure reliably. For this reason, the present pilot focused first on feasibility and acceptance as gating conditions before progressing toward subsequent phases where conceptual change will be examined under controlled conditions.

We considered several alternative evaluation strategies for this first deployment. For example, a heuristic evaluation or expert usability walkthrough could have focused primarily on interface issues and interaction flow, while a laboratory-style usability study might have measured task completion times and error rates under controlled conditions. We also examined broader acceptance frameworks such as TAM2, TAM3 and UTAUT as candidates for our survey design. However, TAM was ultimately selected because it is both parsimonious and well established in LMS-based research, allowing us to capture perceived usefulness and perceived ease of use with a brief, classroom-compatible instrument. In a school setting where teachers and students had limited time available, this design offered a pragmatic balance between methodological rigour and minimal disruption to ongoing instruction.

3.3 Data collection and statistical analysis

A TAM-based survey (Venkatesh and Davis, 2000) measured seven constructs using 5-point Likert scales: clarity, multimedia engagement, relevance, usefulness, cognitive challenge, affective engagement, and actionability (see Supplementary Material Appendix B). System performance metrics and instructor moderation data were automatically logged. Descriptive statistics included means, standard deviations, and 95% confidence intervals (Field, 2018). Internal consistency used Cronbach's alpha (Nunnally and Bernstein, 1994). Inter-construct correlations examined construct validity using Pearson correlations (Hair et al., 2019). Normality testing employed Shapiro-Wilk tests (Tabachnick and Fidell, 2019). Cohen's d effect sizes used neutral midpoint (3.0) comparison with bootstrap confidence intervals (10,000 resamples, BCa method) for robust estimation (Efron and Tibshirani, 1993). Non-parametric tests (Kruskal-Wallis) addressed non-normal distributions with Bonferroni corrections. Qualitative analysis followed inductive content analysis with inter-rater reliability (Cohen's κ > 0.80; Landis and Koch, 1977). SPSS 29.0, α = 0.05.

3.4 Phase 1: quiz design and misconception mapping

Seven multiple-choice questions were adapted from the Force Concept Inventory (FCI), targeting common Newtonian mechanics misconceptions (Wells et al., 2019; Yang et al., 2020). Items covered Newton's laws of motion, force-motion relationships, net force and acceleration, and gravitation. Every incorrect choice was tagged with a validated misconception code derived from physics education research. For example, in the item “A hockey puck slides across a frictionless surface …”, selecting option A activates the misconception tag M01: 'Motion requires continuous force'. The full mapping of quiz questions to misconception codes is provided in “Supplementary Material Appendix D”. These mappings allowed the system to detect the exact nature of student misunderstandings and trigger the AI agent to generate personalized conceptual feedback. An example of the JSON tagging structure: { “quiz_id”: 7, “question_id”: 1, “choice_letter”: “A”, “misconception_code”: “M01”, “misconception_text”: “Constant motion requires continuous force” }

Validation: The six physics instructors conducted collaborative review sessions to ensure clarity, accuracy, and alignment of quiz items and misconception tags through structured discussion and consensus.

3.5 Phase 2: AI agent plugin development

To streamline misconception-targeted feedback, we created a custom Moodle “AI-Feedback” plugin that links directly to OpenAI agent mode. Its workflow keeps all key steps but avoids manual bottlenecks:

Automatic parsing and tagging: When a student submits a multiple-choice quiz, the plugin instantly captures the attempt, checks every selected answer, and looks up any wrong choice in an instructor-built JSON map (choice → misconception code). This lets the system identify the exact concept the student misunderstands. The agent was given a fixed system prompt (see Supplementary Material Appendix C) that specifies the HTML structure, simulation requirements, tone, and cognitive-conflict strategy.
Agent invocation (persistent thread): For each unique misconception code, the plugin starts an Agent-Mode thread. The agent searches trusted sources, pulls simulations or videos, and draughts a response, all without further prompts. Persistent threads mean each agent can fetch, refine, and package content in several steps before returning. To achieve this, the plugin uses OpenAI's Assistants API to dispatch a persistent AI agent configured with “model”: “gpt-4o” for high-speed, accurate outputs and “tools”: [“browser”, “code_interpreter”] for multimedia search and logic processing. A custom PHP function (AgentDispatch()) triggers a Python script (agent_runner.py) located in/local/ai_feedback/python/, which:
- Creates the Assistant via OpenAI assistants
- Initiates a new thread per misconception trigger
- Sends the full instructional prompt (seeAppendix C)
- Executes the run and polls results until completion
- Saves the output as an HTML file named studentid_miscode_timestamp.html

This integration required advanced development expertise in API orchestration, Python scripting, and Moodle plugin architecture. The system's ability to autonomously generate structured HTML feedback at scale is a result of tightly coupled backend engineering with OpenAI's tool-based agent infrastructure.

Feedback composition: The plugin then wraps the agent's output into an HTML page that includes
- a short, plain-language explanation,
- an embedded Phet-style simulation link,
- one YouTube video,
- a reflection prompt,
- one or two follow-up questions, and
- APA-formatted sources. These elements collectively help students understand the cause of their error and how to address it.
Teacher moderation dashboard: All pages arrive in a Moodle dashboard marked “pending.” Instructors can preview, edit inline, approve, or reject each page. Approved pages are released immediately to students, and an email notification is logged. Rejected pages are archived with a reason code for later agent fine-tuning.

All steps like checking answers, running the AI agent, creating the feedback, and teacher review happen inside Moodle. This means the feedback is only shared with students once the instructor reviews and approves it.

Supplementary Material Appendix A provides a detailed implementation guide for the Moodle plugin developed in this study. It includes directory mapping, hooks, agent trigger pseudocode, instructor moderation dashboard, and security notes to support adaptation by other educational institutions.

3.6 Phase 3: quiz administration, AI-generated feedback workflow, and evaluation procedures

Students completed the seven-item Newtonian mechanics quiz during scheduled 30-minute class sessions in both participating schools. Upon submission, the Moodle plugin automatically parsed the students' responses, identified misconception tags associated with incorrect answer choices, and invoked the AI agent to generate individualized HTML feedback pages tailored to each detected misconception. All generated pages were initially placed in a pending state.

Physics instructors then engaged in a structured human-in-the-loop moderation workflow through a dedicated dashboard embedded in Moodle. The interface allowed instructors to preview each feedback page, verify its scientific accuracy and curricular alignment, and either approve the page, apply minor edits, regenerate it through the agent, or reject it when necessary. Once approved, the system automatically delivered moderate feedback to students by email and made it accessible through their Moodle accounts.

Following the receipt of moderated feedback, students were invited to complete an evaluation instrument designed to assess clarity, relevance, usefulness, and overall experience. The survey included seven 5-point Likert items, a 0–10 global clarity rating, comparative preference questions (AI-generated vs. teacher vs. textbook feedback), and two open-ended prompts to elicit qualitative reflections. A total of ninety-three out of one hundred students responded (93% response rate).

Quantitative data was analyzed using descriptive statistics (means, medians, standard deviations, confidence intervals), while qualitative data underwent inductive content analysis. Two independent researchers coded responses and resolved discrepancies through negotiated agreement to ensure reliability and thematic coherence. The combined procedures provided a mixed-methods perspective on the pedagogical value, accuracy, and learner reception of the AI-generated and instructor-moderated feedback cycle.

4. Results

This section presents the empirical findings from our exploratory pilot study, organized to directly address each research question through systematic analysis of system performance, technical feasibility, and user perceptions.

4.1 System performance and technical feasibility (RQ1)

4.1.1 RQ1.1: System reliability and efficiency

The AI agent demonstrated robust technical performance throughout the study period. System Performance Metrics revealed:

Generation Speed: Average feedback creation time of 1.8 min per student (SD = 0.4)
System Reliability: 99.2% uptime with zero technical failures or data loss incidents
Processing Capacity: Successfully handled concurrent quiz submissions from 100 students across two schools
Content Output: Generated 224 personalized HTML feedback pages with consistent quality structure

4.1.1.1 Distribution of generated feedback

Students received an average of 2.24 individualized feedback pages per student (SD = 1.07), reflecting the multi-concept nature of physics misconceptions. This distribution demonstrates the system's ability to automatically identify and address multiple conceptual errors per student, as shown in Table 1.

4.1.2 RQ1.2: Effectiveness of instructor moderation process

The human-in-the-loop moderation workflow proved both necessary and manageable for ensuring pedagogical quality. Across the 224 feedback pages generated, instructors recorded the outcomes summarized in Table 2. Most pages (83%) were either approved without edits or required only minor modifications, demonstrating that the system generally produced usable content. The remaining 17% were rejected, most often due to curriculum misalignment (47.4%) or scientific inaccuracies (26.3%), with tone and multimedia issues appearing less frequently. A closer look at rejection patterns highlighted areas for targeted improvement. As one instructor noted, “Some outputs were excellent. Others needed trimming or better-aligned examples. Having a clear moderation interface made the review process efficient.” Statistical analysis reinforced these patterns: a Kruskal–Wallis test revealed significant differences in review time across outcome categories (χ² = 23.47, p < 0.001). Post-hoc comparisons confirmed that rejected content required substantially more review time than approved content (5.8 vs 2.1 min, p < 0.001), suggesting that quality issues were readily identifiable during moderation.

4.2 User perceptions and acceptance (RQ2)

To address RQ2, this subsection focuses on student perceptions as captured through a structured Likert-scale survey based on the Technology Acceptance Model (TAM) and its extended constructs. A total of 93 students responded to the TAM-based items, rating various dimensions of their experience with the AI-generated feedback system. Table 3 summarizes the descriptive statistics across seven key constructs.

These effect sizes (all d > 1.0) indicate strong practical significance in students' positive perceptions across all measured dimensions. Students reported consistently high ratings across all constructs, with means ranging from 3.88 to 4.27, indicating favourable perceptions of the AI feedback system. Because several constructs cluster near the upper end of a 5-point scale, future studies might employ a 7-point instrument to reduce potential ceiling effects and capture finer distinctions in student perceptions. Notably, Multimedia Engagement received the highest mean score (M = 4.27, SD = 0.57), reflecting strong appreciation for the visual and interactive elements embedded in the feedback pages. This construct also demonstrated high internal consistency (Cronbach's α = 0.81) and a large effect size (Cohen's d = 2.23), underscoring its robust impact on student engagement. This was followed by Actionability (M = 4.17, α = 0.85, d = 1.83), suggesting students found the feedback not only informative but also instructive in guiding next learning steps. Constructs related to core TAM components also received high evaluations. Clarity of Explanations (M = 4.13, α = 0.84, d = 1.79) confirms that the explanations generated by the AI were generally comprehensible. Similarly, Relevance to Misconceptions (M = 4.05, α = 0.86, d = 1.48) and Usefulness for Understanding (M = 4.04, α = 0.83, d = 1.76) indicate that the feedback was perceived as directly addressing student errors and supporting conceptual learning. The relatively lower—but still positive—score for Cognitive Challenge (M = 3.88, SD = 0.82, α = 0.79, d = 1.07) suggests that while students appreciated the system's clarity, some may have desired deeper or more complex prompts to stimulate higher-order thinking. The wider standard deviation on this item suggests more divergent opinions regarding the depth of cognitive prompting provided. In contrast, Affective Engagement also received a strong mean rating (M = 4.10, α = 0.82, d = 1.69), indicating that the tone and motivational features of the AI feedback resonated with students emotionally. All constructs exhibited narrow confidence intervals and medians of 4.0, further reinforcing the consistency of student responses. These findings collectively demonstrate high levels of student acceptance and perceived value of the AI-generated, misconception-specific feedback system. Finally, a chi-square goodness of fit test confirmed that the observed distribution of preferences differed significantly from equal distribution (χ² = 13.42, df = 2, p = 0.0012), indicating a genuine preference for AI-generated feedback over traditional teacher explanations or textbook review. This corresponds to a medium effect size (Cramér's V = 0.27), indicating that the preference for AI feedback is not only statistically significant but also practically meaningful.

4.2.1 RQ2.2: instructor perceptions through moderation behaviour

Instructor acceptance was gauged from two sources: (1) log data captured by the Moodle moderation dashboard (approve/edit/reject counts and timestamps) and (2) post-study, semi-structured interviews with all six instructors; no formal survey was administered.

Pedagogical Approval Patterns: Dashboard logs show that 75% of AI-generated pages were approved without changes, and a further 8% needed only minor edits—so 83% met instructors' standards with minimal intervention.

4.2.1.1 Workflow-integration themes (interviews)

Efficiency Gains: “The moderation process was more efficient than creating individualized feedback from scratch” (5/6 instructors)
Quality Consistency: “AI-generated explanations followed a consistent pedagogical structure” (4/6 instructors)
Curriculum Alignment Needs: “The system needs better understanding of local curriculum requirements” (6/6 instructors)

Comparative Time Analysis: Instructors estimated that traditional individualized feedback creation would require 8–12 min per student, compared to the observed 3.2 min average for AI feedback moderation—representing a 60–73% time reduction.

4.3 Qualitative insights: student open-ended responses

Thematic analysis of 93 open-ended survey responses revealed convergent themes supporting quantitative findings, with Cohen's κ = 0.82 inter-rater reliability.

4.3.1 Major themes identified

Diagnostic Precision: mentioned by 62 students (≈67%)
- “I finally saw that I kept mixing up acceleration with net force—your diagram pointed it out”
- “The first quiz question exposed exactly where my thinking was wrong, so I knew what to fix”
Multimodal Learning Value: mentioned by 55 students (≈59%)
- “The combo of the force-arrow video and the interactive sim made everything clear.”
- “Reading it was okay, but once I dragged the arrows in the sim, the law really made sense”
Metacognitive Activation: mentioned by 40 students (≈43%)
- “Videos buffer forever on my data plan—can we get a low-bandwidth version?”
- “The little self-check after each section made me stop and think instead of just scrolling”
Areas for Improvement: mentioned by 17 students (≈18%)
- “Bandwidth issue in loading videos”
- “One of the force-diagram links is broken; it jumps to a 404 page.”

5. Discussion

This pilot study provides empirical evidence for the technical feasibility and educational potential of integrating AI agents into an instructor-moderated feedback system designed to address persistent misconceptions in Newtonian mechanics in physics education.

Interpreting these results against RQ1–RQ2 suggests that feasibility and user acceptance were not accidental outcomes but driven by specific pedagogical mechanisms. In particular, the high ratings for clarity, actionability, and multimedia engagement indicate that students found the feedback both intelligible and helpful for addressing their own misconceptions—thus supporting RQ2. Furthermore, the fact that instructors approved 83% of pages with no or minor edits provides empirical support for RQ1.2 by demonstrating that a moderation-in-the-loop model is not only technically feasible, but also sustainable in real secondary classroom workflows. In other words, the system did not merely work “technically”—it worked “instructionally” in a manner acceptable to teachers and usable by students.

This interpretation aligns with recent physics education studies showing that LLM explanations can be persuasive but still fragile when handling conceptual depth and representations (Polverini et al., 2025; Dunlap et al., 2025; Aldazharova et al., 2024). Unlike those studies, however, our pilot embedded AI directly inside the LMS workflow and required instructors to explicitly moderate each feedback page before students received it. This design decision mitigates some of the fragility reported in prior studies by combining scalable generation with local pedagogical oversight. Therefore, our contribution is not just another study of LLM feedback quality on its own. Instead, we show a working model that delivers automatically generated conceptual feedback in a way that teachers trust and that fits into real school systems. This is different from earlier studies done outside LMS environments, where feedback was judged without any teacher review or control. Together, these results directly answer RQ1 (feasibility and moderation) and RQ2 (student perceptions).

Guided by the Technology Acceptance Model (TAM), the findings highlight strong student receptivity to the AI-generated feedback. These results show that students are more open to using AI tools in physics class when the feedback is clear, personalized, and helpful for their learning. This hybrid approach may help solve the long-time challenge of giving feedback at scale. It mixes the speed of AI with the quality-experience of real instructors. On the other hand, instructors reported that their dashboard made their work easier, and most of the AI-generated feedback pages required no or small revisions. These patterns show that instructor moderation plays a key role in making sure AI-generated feedback is accurate and aligns with local standards and classroom expectations. This reinforces the idea that keeping instructors in the loop is essential for maintaining quality. At the same time, the findings suggest clear ways to improve the system such as refining prompts, better connecting feedback to the curriculum, and making the review process easier for teachers. Measuring conceptual change before establishing stable acceptance would risk a methodological confound—null gains could be caused either by insufficient pedagogical quality or by simple non-adoption. This staged logic also situates our future learning-outcome studies within an ICAP framing (Chi and Wylie, 2014): the multimodal elements of the feedback (simulations, reflective prompts, follow-up questions) are designed to move learners from passive reception toward constructive engagement. Thus, TAM establishes that students will attend, Posner explains the dissatisfaction mechanism needed for conceptual change, and ICAP predicts which feedback features are likely to produce durable conceptual restructuring. Beyond Newtonian mechanics, future research will evaluate whether this agent-mediated feedback model transfers to other STEM topics (e.g. electricity, thermodynamics, chemical bonding, and algebraic reasoning) and to alternative instructional models such as inquiry-based laboratory settings.

5.1 Bias and generalizability considerations

Because this pilot was conducted in two Grade 12 classrooms within a single national curriculum context, the results should not be interpreted as universally generalizable. The observed acceptance may partly reflect the novelty of AI tools and existing trust in the technology ecosystem within these schools. In addition, the misconception taxonomy was aligned to the Lebanese national Newtonian mechanics curriculum, which may differ from representational conventions used in IB, A-level, NGSS, or AP Physics. Future research should examine how acceptance and instructional value vary across different curricular systems and sociotechnical contexts. A further limitation concerns potential biases in the underlying language model. Because the agents draw on a general-purpose LLM trained on large-scale internet corpora, the feedback may inadvertently reflect skewed topic coverage, cultural assumptions or stereotyped examples that were not systematically audited in this pilot. We did not analyze differential effects across subgroups (e.g. gender, prior achievement or language background), so questions of fairness and equity in how misconceptions are diagnosed and explained remain open and should be examined in future work.

5.2 Instructor moderation insights

Instructor reflections also revealed that their role shifted from generating feedback to curating and validating it. Teachers reported that the most time-consuming moderation tasks were not fixing physics explanations, but adjusting tone, simplifying language, and replacing examples with culturally familiar analogies. This suggests that the pedagogical bottleneck shifts from conceptual correctness to contextual relevance — an important distinction for future scaling. Contrary to common concerns that AI might “bypass” teacher knowledge or reduce teacher agency, instructors consistently indicated the opposite: they felt empowered because the system allowed them to focus effort where their pedagogical expertise mattered most (e.g. nuance, emphasis, variation), rather than typing out full explanations from scratch.

5.3 AI literacy and critical interpretation

A further implication of these findings is that AI literacy cannot be treated as a peripheral digital skill: students require explicit structures to learn how to interrogate AI explanations, not simply consume them. Recent work in K–12 contexts shows that students often accept AI-generated statements as epistemically authoritative unless they are taught how to identify when an explanation contradicts disciplinary principles (Casal-Otero et al., 2023; Long et al., 2026). Within misconception-rich domains like Newtonian mechanics, preparing students to critically question AI outputs may be as important as the feedback itself. For this reason, future iterations of the system will incorporate reflective prompts that guide students to articulate why an explanation is correct — rather than merely reading and accepting AI feedback. Moreover, AI literacy must also include an ethical dimension. Students need support to recognize that generative AI may produce misleading or biased explanations, especially when their own prior conceptions align with incorrect outputs. Preparing students to question AI-based claims—and to seek evidence rather than accept statements at face value—aligns with the emerging view that AI education must cultivate epistemic vigilance, not just tool-use competence (Long et al., 2026). Embedding such reflective routines in school practice may help students become alert to both their own misconceptions and to the limitations of generative AI, positioning AI as a partner for reasoning rather than an unquestioned source of truth.

5.4 Ethical and equity considerations

Despite its promise, the framework presents several ethical and equity-related considerations. First, the inclusion of instructor review before student access ensures a safeguard against inaccuracies and inappropriate content, an essential safeguard in educational settings where trust and accuracy are paramount. In addition, because the study ran in January–May 2025 before GPT Agent Mode became available to Plus users on 26 July 2025, we operated under the ChatGPT Pro tier (USD $200 per month). That plan allowed 400 agent-mode requests, whereas the later Plus rollout offers only 40 requests; thus, cost and quota constraints could hinder replication in resource-constrained settings. This approach directly addresses concerns raised in the literature regarding the limitations of autonomous AI in high stakes learning environments. Second, the technical requirements of the system, such as Moodle integration, server-side plugin installation, and stable internet connectivity may pose barriers in under-resourced educational contexts. Equitable deployment will require adaptations for low-bandwidth environments and potentially the development of cloud-hosted variants to reduce local infrastructure demands. Third, the cultural relevance of multimedia content generated by the AI agents must be considered. Although the agent dynamically retrieves and synthesizes materials, it may inadvertently surface content that lacks contextual alignment with local curricula or student experiences. Embedding cultural responsiveness, through curated resource libraries, localized content repositories, or teacher-provided examples could strengthen engagement and inclusivity.

6. Ethics statement

The study was approved by the Research Ethics Committee at the Lady of Balamand High School for the 2 sections (Approval ID: EDU-AI-2025–04). All participants provided informed consent. Student responses were anonymized, and no personally identifiable information was collected.

7. Key contributions

This study contributes several innovations to the field of AI in education:

Novel Technical Integration: To our knowledge, this is the first deployment of persistent, tool-enabled AI agents within a mainstream LMS (Moodle) for real-time, misconception-specific feedback generation.
Transparent Misconception Mapping: The use of a JSON-based schema creates auditable, explicit links between student errors and targeted feedback resources, supporting interpretability and reusability.
Validated Human–AI Collaboration: The instructor moderation interface demonstrates how AI and human educators can synergistically co-create pedagogically sound, personalized content.
Strong User Acceptance: The TAM-based evaluation confirms high student and instructor receptivity, which is a critical factor for sustainable adoption and institutional integration.

8. Practical implications

AI-generated feedback reduced instructor time and increased personalization simultaneously—two outcomes that are normally in tension in classroom practice. In this study, instructors shifted from producing full explanations to refining AI output, enabling them to redirect effort toward nuance and contextualization rather than manual text generation. Students benefited from faster feedback that was specific to their misconception rather than generic remediation advice. Thus, the practical implication is not only efficiency, but a reallocation of teacher effort toward higher-order pedagogical judgement.

Building on these findings, this indicates that AI agents can reduce rather than increase teacher workload at scale, provided that appropriate moderation controls are incorporated into the workflow. The observed 60–73% reduction in teacher feedback time suggests that human moderation can operate as a refinement layer rather than a bottleneck. For institutions, this indicates that such systems could realistically be deployed in existing LMS environments to support misconception-focused feedback in high-enrolment courses. However, sustainable adoption will require (a) prompt templates tuned to local curriculum, (b) curation of vetted multimedia sources, and (c) governance structures that maintain teacher control without requiring teachers to author every feedback page from scratch.

9. Limitations and Future work

This exploratory pilot prioritized technical validation and user acceptance over measuring actual learning gains, so no pre-/post-tests were administered. Conducted in just two similar private schools in northern Lebanon, its context limits generalizability—future studies should span diverse settings, student profiles, and longer timelines to control for novelty effects. A 7% non-response rate and possible self-selection bias may have slightly inflated survey scores. Student preferences for AI vs. teacher vs. textbook feedback were drawn from experience rather than direct comparison; randomized trials across multiple feedback types are needed. These focused boundaries establish a clear roadmap for controlled effectiveness studies and broader deployments.

10. Future research phases

Phase roadmap

Phase 2 – Effectiveness: Run controlled pre/post FCI studies in multiple schools.
Phase 3 – Adoption: Track year-long usage patterns and teacher workload.
Phase 4 – Comparison: Contrast AI, teacher-written, and textbook feedback across more physics topics.

Key directions

Quantify learning gains with mixed method pre/post designs.
Extend the framework to other STEM areas.
Pilot live conversational agents for real-time remediation.
Curate a localized library of evaluated simulations, videos, and readings.
Study long-term sustainability (cost, maintenance, motivation).

11. Conclusion

This exploratory study introduces a new way to use AI agents to provide students with personalized feedback in Newtonian mechanics, especially when they have common misconceptions. The system successfully demonstrated technical feasibility while maintaining instructor oversight. The high ratings for both perceived usefulness and perceived ease of use indicate that students found the technology helpful and easy to use, indicating potential acceptance and adoption. Students' preference for AI-generated feedback over traditional methods (50.5% vs. 29% for teacher explanations and 20.4% for textbooks) suggests significant educational value when AI is thoughtfully integrated with instructor expertise. The key innovation lies in its hybrid approach: using AI's capabilities for scalable, personalized content generation while preserving instructors' essential role in ensuring pedagogical appropriateness and data accuracy. This represents a significant advancement toward addressing student misconceptions with adaptive learning experiences. This setup with persistent AI agents gives a clear example of how future feedback and tutoring tools can work while keeping instructors involved; these results are based on what students liked, not on a strict comparison study. Unlike traditional e-learning tools or basic chatbot systems, our pilot study brings together persistent AI agents, misconception tagging, and real-time multimedia generation—all within the LMS. This unique combination makes it possible to deliver personalized, flexible feedback that adapts to online resources and is guided by the instructor's professional judgement. Moreover, the framework is flexible enough to be used across different subjects and learning environments. While it was designed for physics, it could also be applied to areas like chemistry, biology, or even the humanities—any subject where students often struggle with core concepts. The system's strong technical performance, positive feedback from students, and smooth collaboration between AI and instructors provide a starting point for future research focused on measuring real learning gains. Although this pilot mainly tested whether the system works and how students respond to it—rather than performing a comparative study on learning gains—it lays the groundwork for future studies. The use of persistent AI agents, misconception tagging, and teacher moderation creates a model that others can follow to deliver personalized feedback in STEM education. This study also supports the idea that AI in education should be culturally responsive and instructor-guided, showing that it is possible to use advanced tools without taking away the instructor's role in shaping learning.

AI-usage disclosure

“No generative AI was used in the conception or writing of this article; only standard spelling- and grammar-checking tools were employed.”

The supplementary material for this article can be found online.

References

Aldazharova

Issayeva

Maxutov

and

Balta

(

2024

), “

Assessing AI's problem solving in physics: analysing reasoning, false positives and negatives through the force concept inventory

”,

Contemporary Educational Technology

, Vol.

No.

, p.

538

, doi:

https://doi.org/10.30935/cedtech/15592

Google Scholar

Crossref

Ali

Warraich

N.F.

and

Butt

(

2025

), “

Acceptance and use of artificial intelligence and AI-based applications in education: a meta-analysis and future direction

”,

Information Development

, Vol.

No.

, pp.

859

874

, doi:

https://doi.org/10.1177/02666669241257206

Google Scholar

Crossref

Ayeni

O.O.

Al Hamad

N.M.

Chisom

O.N.

Osawaru

and

Adewusi

O.E.

(

2024

), “

AI in education: a review of personalised learning and educational technology

”,

GSC Advanced Research and Reviews

, Vol.

No.

, pp.

261

271

Google Scholar

Crossref

Barakat

Salim

N.A.

and

Sallam

(

2025

), “

University educators' perspectives on ChatGPT: a technology acceptance model-based study

”,

Open Praxis

, Vol.

No.

, pp.

129

144

, doi:

https://doi.org/10.55982/openpraxis.17.1.718

Google Scholar

Crossref

Brown

D.E.

and

Clement

(

1987

), “

Misconceptions concerning Newton's law of action and reaction: the underestimated importance of the third law

”,

Proceedings of the Second International Seminar on Misconceptions and Educational Strategies in Science and Mechanics

, Vol.

, pp.

Google Scholar

Casal-Otero

Catala

Fernández-Morante

Taboada

Cebreiro

and

Barro

(

2023

), “

AI literacy in K-12: a systematic literature review

”,

International Journal of STEM Education

, Vol.

No.

, p.

, doi:

https://doi.org/10.1186/s40594-023-00418-7

Google Scholar

Crossref

Chee

Ahn

and

Lee

(

2025

), “

A competency framework for AI literacy: variations by different learner groups and an implied learning pathway

”,

British Journal of Educational Technology

, Vol.

No.

, pp.

2146

2182

, doi:

https://doi.org/10.1111/bjet.13556

Google Scholar

Crossref

Chen

and

Wan

(

2025

), “

Grading explanations of problem-solving process and generating feedback using large language models at human-level accuracy

”,

Physical Review Physics Education Research

, Vol.

No.

, 010126, doi:

https://doi.org/10.1103/physrevphyseducres.21.010126

Google Scholar

Chi

M.T.H.

and

Wylie

(

2014

), “

The ICAP framework: linking cognitive engagement to active learning outcomes

”,

Educational Psychologist

, Vol.

No.

, pp.

219

243

, doi:

https://doi.org/10.1080/00461520.2014.965823

Google Scholar

Crossref

Dunlap

J.C.

Sissons

and

Widenhorn

(

2025

), “

Descending an inclined plane with a large language model

”,

Physical Review Physics Education Research

, Vol.

No.

, 010153, doi:

https://doi.org/10.1103/physrevphyseducres.21.010153

Google Scholar

Efron

and

Tibshirani

R.J.

(

1993

An Introduction to the Bootstrap

Chapman & Hall

New York, NY

Google Scholar

Crossref

Fathema

Shannon

and

Ross

(

2015

), “

Expanding the technology acceptance model (TAM) to examine faculty use of learning management systems (LMSs) in higher education institutions

”,

Journal of Online Learning and Teaching

, Vol.

, pp.

210

233

Google Scholar

Fearnley

M.R.

and

Amora

J.T.

(

2020

), “

Learning management system adoption in higher education using the extended technology acceptance model

”,

IAFOR Journal of Education

, Vol.

No.

, pp.

106

Google Scholar

Crossref

Field

(

2018

Discovering Statistics Using IBM SPSS Statistics

, (5th ed.) ,

Sage Publications

London

Google Scholar

Goel

A.K.

and

Polepeddi

(

2018

), “Jill Watson: a virtual teaching assistant for online education”, in

Learning Engineering for Online Education

Routledge

, pp.

120

143

Google Scholar

Crossref

Guo

Latif

Zhou

Huang

and

Zhai

(

2024

), “

Using generative ai and multi-agents to provide automatic feedback

”,

arXiv preprint

arXiv:

2411.07407

Google Scholar

Hair

J.F.

Black

W.C.

Babin

B.J.

and

Anderson

R.E.

(

2019

Multivariate Data Analysis

, (8th ed.) ,

Cengage Learning

Andover

Google Scholar

Halloun

I.A.

and

Hestenes

(

1985

), “

The initial knowledge state of college physics students

”,

American Journal of Physics

, Vol.

No.

, pp.

1043

1055

, doi:

https://doi.org/10.1119/1.14030

Google Scholar

Crossref

Hattie

and

Timperley

(

2007

), “

The power of feedback

”,

Review of Educational Research

, Vol.

No.

, pp.

112

Google Scholar

Crossref

Wang

Chen

and

Zhou

(

2025

), “

Integrating generative AI into STEM education: enhancing conceptual understanding, addressing misconceptions, and assessing student acceptance

”,

Discover Education

, Vol.

No.

, doi:

https://doi.org/10.1186/s43031-025-00125-z

Google Scholar

Hestenes

Wells

and

Swackhamer

(

1992

), “

Force concept inventory

”,

The Physics Teacher

, Vol.

No.

, pp.

141

158

, doi:

https://doi.org/10.1119/1.2343497

Google Scholar

Crossref

Holmes

and

Tuomi

(

2022

), “

State of the art and practice in AI in education

”,

European Journal of Education

, Vol.

No.

, pp.

542

570

, doi:

https://doi.org/10.1111/ejed.12533

Google Scholar

Crossref

Kabilan

M.K.

Ahmad

Hashim

Mahbob

and

Lee

C.K.

(

2025

), “

Artificial intelligence (AI) as a new trajectory in education

”,

Journal of Educational Research and Development

, Vol.

No.

, pp.

150

170

Google Scholar

Kahaleh

and

Lopez

(

2025

), “

Evaluating large language models in high school physics education: addressing misconceptions and fostering conceptual understanding

”,

Physics Education

, Vol.

No.

, 025013, doi:

https://doi.org/10.1088/1361-6552/adb235

Google Scholar

Keuning

Jeuring

and

Heeren

(

2018

), “

A systematic literature review of automated feedback generation for programming exercises

”,

ACM Transactions on Computing Education

, Vol.

No.

, pp.

3:1

3:43

, doi:

https://doi.org/10.1145/3231711

Google Scholar

Crossref

King

W.R.

and

(

2006

), “

A meta-analysis of the technology acceptance model

”,

Information and Management

, Vol.

No.

, pp.

740

755

, doi:

https://doi.org/10.1016/j.im.2006.05.003

Google Scholar

Crossref

Kortemeyer

(

2023

), “

Could an artificial-intelligence agent pass an introductory physics course?

”,

Physical Review Physics Education Research

, Vol.

No.

Google Scholar

Kulik

J.A.

and

Fletcher

J.D.

(

2016

), “

Effectiveness of intelligent tutoring systems: a meta-analytic review

”,

Review of Educational Research

, Vol.

No.

, pp.

, doi:

https://doi.org/10.3102/0034654315581420

Google Scholar

Crossref

Landis

J.R.

and

Koch

G.G.

(

1977

), “

The measurement of observer agreement for categorical data

”,

Biometrics

, Vol.

No.

, pp.

159

174

, doi:

https://doi.org/10.2307/2529310

Google Scholar

Crossref

PubMed

Long

D.Y.

Wang

Md Rashid

and

X.T.

(

2026

), “

Artificial intelligence in higher education: a systematic review of its impact on student engagement and the mediating role of teaching methods

”,

Frontiers in Education

, Vol.

Google Scholar

Luckin

Holmes

Griffiths

and

Forcier

(

2016

Intelligence Unleashed: An Argument for AI in Education

Pearson Education

London

Google Scholar

Moghaddam

Yaghoubzadeh

and

Ananiadou

(

2024

), “

Artificial intelligence applications in education: natural language processing in detecting misconceptions

”,

Education and Information Technologies

, Vol.

No.

, doi:

https://doi.org/10.1007/s10639-024-12919-1

Google Scholar

Murillo

G.G.

Novoa-Hernández

and

Serrano Rodríguez

(

2021

), “

Technology acceptance model and Moodle: a systematic mapping study

”,

Information Development

, Vol.

No.

, pp.

617

632

, doi:

https://doi.org/10.1177/0266666920959367

Google Scholar

Crossref

Mustofa

R.H.

Kuncoro

T.G.

Atmono

Hermawan

H.D.

and

Sukirman

(

2025

), “

Extending the technology acceptance model: the role of subjective norms, ethics, and trust in AI tool adoption among students

”,

Computers and Education: Artificial Intelligence

, Vol.

, 100379, doi:

https://doi.org/10.1016/j.caeai.2025.100379

Google Scholar

Nunnally

J.C.

and

Bernstein

I.H.

(

1994

Psychometric Theory

, (3rd ed.) ,

McGraw-Hill

New York, NY

Google Scholar

Polverini

Melin

Önerud

and

Gregorcic

(

2025

), “

Performance of ChatGPT on tasks involving physics visual representations: the case of the Brief Electricity and Magnetism Assessment

”,

Physical Review Physics Education Research

, Vol.

No.

, 010154, doi:

https://doi.org/10.1103/PhysRevPhysEducRes.21.010154

Google Scholar

Posner

G.J.

Strike

K.A.

Hewson

P.W.

and

Gertzog

W.A.

(

1982

), “

Accommodation of a scientific conception: toward a theory of conceptual change

”,

Science Education

, Vol.

No.

, pp.

211

227

, doi:

https://doi.org/10.1002/sce.3730660207

Google Scholar

Crossref

Reina Marín

Cruz Caro

Maicelo Rubio

Y.D.C.

Alva Tuesta

J.N.

Sánchez Bardales

Carrasco Rituay

A.M.

and

Chávez Santos

(

2025

), “Artificial intelligence as a teaching tool in university education”,

Frontiers in Education

Vol.

, 1578451, doi:

https://doi.org/10.3389/feduc.2025.1578451

Google Scholar

Crossref

Resbiantoro

and

Setiani

(

2022

), “

A review of misconception in physics: the diagnosis, causes and remediation

”,

Journal of Turkish Science Education

, Vol.

No.

, pp.

403

427

Google Scholar

Sabatini

Graesser

A.C.

Hollander

and

O'Reilly

(

2023

), “

A framework of literacy development and how AI can transform theory and practice

”,

British Journal of Educational Technology

, Vol.

No.

, pp.

1174

1203

, doi:

https://doi.org/10.1111/bjet.13342

Google Scholar

Crossref

Schick

Dessì

Raileanu

Lomeli

Zettlemoyer

Cancedda

and

Scialom

(

2023

), “

Toolformer: language models can teach themselves to use tools

”,

arXiv preprint

arXiv:

2302.04761

Google Scholar

Shi

and

Aryadoust

(

2024

), “

A systematic review of AI-based automated written feedback research

”,

ReCALL

, Vol.

No.

, pp.

187

209

Google Scholar

Crossref

Strike

K.A.

and

Posner

G.J.

(

1982

), “

Conceptual change and science teaching

”,

European Journal of Science Education

, Vol.

No.

, pp.

231

240

, doi:

https://doi.org/10.1080/0140528820040302

Google Scholar

Crossref

Šumak

Heričko

and

Pušnik

(

2011

), “

A meta-analysis of e-learning technology acceptance: the role of user types and e-learning technology types

”,

Computers in Human Behavior

, Vol.

No.

, pp.

2067

2077

, doi:

https://doi.org/10.1016/j.chb.2011.08.005

Google Scholar

Crossref

Tabachnick

B.G.

and

Fidell

L.S.

(

2019

Using Multivariate Statistics

, (7th ed.) ,

Pearson

Boston, MA

Google Scholar

U.S. Department of Education

(

2025

Artificial Intelligence and the Future of Teaching and Learning

U.S. Department of Education

Venkatesh

and

Davis

F.D.

(

2000

), “

A theoretical extension of the Technology Acceptance Model: four longitudinal field studies

”,

Management Science

, Vol.

No.

, pp.

186

204

, doi:

https://doi.org/10.1287/mnsc.46.2.186.11926

Google Scholar

Crossref

Wang

Zhu

Wang

Tran

and

(

2024a

), “

Artificial intelligence in education: a systematic literature review

”,

Expert Systems with Applications

, Vol.

252

, 124167, doi:

https://doi.org/10.1016/j.eswa.2024.124167

Google Scholar

Wang

Feng

Zhang

Yang

Zhang

Chen

Tang

Chen

Lin

Zhao

W.X.

Wei

and

Wen

(

2024b

), “

A survey on large language model based autonomous agents

”,

Frontiers of Computer Science

, Vol.

No.

, 186345, doi:

https://doi.org/10.1007/s11704-024-40231-1

Google Scholar

Weijers

Melin

Önerud

and

Gregorcic

(

2025

), “

From intuition to understanding: using ai peers to overcome physics misconceptions

”,

arXiv preprint

arXiv:

2504.00408

Google Scholar

Wells

Henderson

Stewart

Yang

and

Traxler

(

2019

), “

Exploring the structure of misconceptions in the Force Concept Inventory with modified module analysis

”,

Physical Review Physics Education Research

, Vol.

No.

, 020122, doi:

https://doi.org/10.1103/PhysRevPhysEducRes.15.020122

Google Scholar

Yang

Wells

Henderson

Christman

Stewart

and

Stewart

(

2020

), “

Extending modified module analysis to include correct responses: analysis of the Force Concept Inventory

”,

Physical Review Physics Education Research

, Vol.

No.

, 010124, doi:

https://doi.org/10.1103/PhysRevPhysEducRes.16.010124

Google Scholar

Yao

Zhao

Shafran

Narasimhan

and

Cao

(

2023

), “

ReAct: synergizing reasoning and acting in language models

”,

International Conference on Learning Representations (ICLR)

, doi:

https://doi.org/10.48550/arXiv.2210.03629

Google Scholar

Number of feedback pages	Students (n)	Percentage
1 page	23	23.0%
2 pages	31	31.0%
3 pages	28	28.0%
4+ pages	18	18.0%
Total	100	100%

Number of feedback pages	Students (n)	Percentage
1 page	23	23.0%
2 pages	31	31.0%
3 pages	28	28.0%
4+ pages	18	18.0%
Total	100	100%

Moderation outcome	Count	%	Avg review time	Most common reason (if rejected)
Approved (no edits)	168	75.0%	2.1 min	–
Edited (minor)	18	8.0%	4.7 min	–
Rejected	38	17.0%	5.8 min	Curriculum misalignment (47.4%), Scientific inaccuracies (26.3%), Tone issues, Multimedia
Total	224	100%	3.2 min (avg)	–

Moderation outcome	Count	%	Avg review time	Most common reason (if rejected)
Approved (no edits)	168	75.0%	2.1 min	–
Edited (minor)	18	8.0%	4.7 min	–
Rejected	38	17.0%	5.8 min	Curriculum misalignment (47.4%), Scientific inaccuracies (26.3%), Tone issues, Multimedia
Total	224	100%	3.2 min (avg)	–

Construct	Mean (1–5)	SD	Median	95% CI	Cronbach's α	Effect size (Cohen's d)
Clarity of Explanations	4.13	0.63	4.00	[4.00, 4.26]	0.84	1.79
Multimedia Engagement	4.27	0.57	4.00	[4.15, 4.39]	0.81	2.23
Relevance to Misconceptions	4.05	0.71	4.00	[3.91, 4.19]	0.86	1.48
Usefulness for Understanding	4.04	0.59	4.00	[3.92, 4.16]	0.83	1.76
Cognitive Challenge	3.88	0.82	4.00	[3.71, 4.05]	0.79	1.07
Affective Engagement	4.10	0.65	4.00	[3.97, 4.23]	0.82	1.69
Actionability	4.17	0.64	4.00	[4.04, 4.30]	0.85	1.83

Instructor-guided AI agents for Newtonian misconceptions in moodle a pilot study

1. Introduction

1.1 Related work

1.1.1 AI applications in physics education and misconception remediation

1.1.2 Automated feedback systems: rule-based ITS → MLM → LLM

1.2 Agentic AI affordances and human-in-the-loop governance

1.3 LMS-embedded, misconception-triggered agents (gap and contribution)

2. Theoretical framework

2.1 Technology acceptance in educational contexts

2.2 Conceptual change and feedback design

2.3 Linking agentic affordances to acceptance and learning

2.4 Human-in-the-loop moderation as an epistemic and pedagogical safeguard

3. Methodology

3.1 Context and participants

3.2 Pilot study design rationale

3.3 Data collection and statistical analysis

3.4 Phase 1: quiz design and misconception mapping

3.5 Phase 2: AI agent plugin development

3.6 Phase 3: quiz administration, AI-generated feedback workflow, and evaluation procedures

4. Results

4.1 System performance and technical feasibility (RQ1)

4.1.1 RQ1.1: System reliability and efficiency

4.1.1.1 Distribution of generated feedback

4.1.2 RQ1.2: Effectiveness of instructor moderation process

4.2 User perceptions and acceptance (RQ2)

4.2.1 RQ2.2: instructor perceptions through moderation behaviour

4.2.1.1 Workflow-integration themes (interviews)

4.3 Qualitative insights: student open-ended responses

4.3.1 Major themes identified

5. Discussion

5.1 Bias and generalizability considerations

5.2 Instructor moderation insights

5.3 AI literacy and critical interpretation

5.4 Ethical and equity considerations

6. Ethics statement

7. Key contributions

8. Practical implications

9. Limitations and Future work

10. Future research phases

11. Conclusion

AI-usage disclosure

References

Further reading

Supplementary data

Data & Figures

Contents

Supplements

Supplementary data

References

Related

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Languages

Sharing Unavailable