Skip to Main Content
Purpose

This paper aims to introduce a neuro-symbolic affect-aware learning agent designed to optimize learner engagement and knowledge retention in virtual learning environments (VLEs).

Design/methodology/approach

The proposed system integrates deep neural networks for multimodal emotion recognition (facial, textual and auditory inputs) with a rule-based symbolic reasoning engine that adapts instructional delivery based on detected affective states. Emotion detection was achieved using a hybrid pipeline comprising a ResNet-50 model (trained on AffectNet for facial cues), fine-tuned BERT (on GoEmotions for textual cues) and wav2vec2.0 (on IEMOCAP for speech signals). To evaluate pedagogical effectiveness, a controlled experiment was conducted with 80 participants divided into three groups: a control group, a neural-only agent group and the proposed neuro-symbolic agent group. Learner engagement was quantified using the User Engagement Scale (UES), and learning outcomes were measured using normalized pre-test/post-test gain scores.

Findings

Results indicate that the neuro-symbolic agent outperformed the baseline by 16.8% in engagement and 21.3% in learning gain, demonstrating the benefits of emotionally adaptive and context-aware instruction.

Research limitations/implications

The study was conducted with a limited sample size (80 participants) and focused on short-term engagement and learning outcomes. Further research is required to assess long-term effectiveness and generalizability across diverse educational contexts.

Social implications

The proposed framework highlights the potential of affect-aware, neuro-symbolic systems to enhance learner engagement, promote self-regulated learning and support personalized instruction in VLEs, contributing to more empathetic and human-centered digital education.

Originality/value

This work presents a novel integration of multimodal emotion recognition with symbolic reasoning for real-time, pedagogically adaptive learning, offering a transparent, interpretable and emotionally responsive approach to VLEs.

Virtual learning environments (VLEs) have become central to modern education, offering learners and instructors benefits such as flexibility, accessibility and scalable content delivery (Johnson et al., 2020). Their adoption has accelerated due to global disruptions like the COVID-19 pandemic, positioning VLEs as key platforms for both formal and informal learning (Karakose, 2021). From learning management systems to intelligent tutoring systems and massive open online courses (MOOCs), these platforms have made education available to learners across subjects and age groups like never before (Tilak and Kumar, 2022).

Despite these advances, most VLEs struggle to deliver truly personalized and emotionally responsive learning experiences (Berezi, 2025). Traditional systems typically rely on static learner profiles tracking only demographics, activity history or performance metrics (Engelbrecht and Oates, 2022). While this allows for basic adaptation, such as adjusting content difficulty, it overlooks the affective dimension of learning, which strongly influences attention, motivation and retention (Lantz-Wagner, 2022). Research in educational psychology shows that emotions like frustration, boredom, curiosity or anxiety significantly shape engagement, persistence and problem-solving behavior (Ismail and Aljabr, 2025). Yet, most current VLEs remain affectively unaware, unable to detect learners’ emotional states or respond with pedagogically meaningful interventions (Evain et al., 2021).

Addressing this gap requires systems that perceive learners’ emotions in real time and adjust instruction dynamically. Such systems must go beyond surface-level profiling to understand the cognitive-affective interplay and intervene with strategies that are both pedagogically sound and emotionally intelligent. Integrating affective awareness into VLEs is therefore not merely a technical enhancement it is essential for richer engagement, reduced dropout and more human-centered learning (Filatro and Cavalcanti, 2024).

Recent advances in artificial intelligence, particularly affective computing and educational data mining, make this possible by decoding emotional cues from facial expressions, voice, physiological signals and text (Vistorte et al., 2024). However, most affect-sensitive systems rely heavily on deep learning perception alone, which, while effective at detecting emotions, often lacks transparency, reasoning capability and pedagogical soundness (Gil and Selman, 2019). This has sparked interest in neuro-symbolic AI, which combines the perceptual strengths of neural networks with the logical reasoning of symbolic systems (Liang et al., 2025).

This work introduces a decision-level neuro-symbolic fusion framework for affect-aware personalization in VLEs. In this framework, neural modules detect affective states such as frustration or engagement from multimodal inputs, while symbolic modules apply pedagogical rules and reasoning to determine the most appropriate instructional intervention. The system is designed to not only perceive learners’ emotions but also explain and justify the instructional decisions, creating a transparent and empathetic learning experience (Mileo, 2025).

The novelty of this approach is twofold. First, it introduces a confidence-aware fusion controller that dynamically regulates the interaction between neural predictions and symbolic reasoning, activating symbolic inference only when necessary to balance interpretability and robustness. Second, it establishes a direct mapping between learners’ affective states and instructional strategies, operationalizing principles from adaptive learning, motivation and engagement theory. Unlike traditional hybrid systems that statically combine components, this framework transforms raw emotional signals into actionable, pedagogically grounded interventions.

In summary, this research advances both conceptual understanding and practical implementation: it moves beyond system integration to demonstrate how neuro-symbolic reasoning can be applied for real-time, emotionally adaptive learning. By combining perception, reasoning and pedagogy, the proposed framework represents a step toward VLEs that are truly responsive, interpretable and learner-centered.

Despite progress in VLEs and affect-sensitive systems, existing platforms largely remain unable to perceive and respond to learners’ emotions in a transparent, pedagogically grounded way. This gap limits engagement, learning outcomes and the potential for truly personalized digital education. To address this, our work proposes a decision-level neuro-symbolic framework that integrates real-time affect detection with symbolic reasoning to select context-aware instructional strategies. This approach not only enhances learner engagement and personalization but also provides interpretable, theory-informed interventions, advancing both the conceptual understanding and practical implementation of emotionally adaptive learning systems.

Integration of affective computing into educational technology has been gaining more and more attention in recent years due to its potential to improve learners’ motivation, engagement and well-being. With growing adoption of VLEs, researchers have emphasized that students require systems that are able to comprehend and engage with their emotions. Liu and Ardakani (2022) propose an affective e-learning system model assisted by machine learning that incorporates emotional data within learning content adaptation. The work reflects how emotion-aware systems can significantly enhance learning performance and student satisfaction using context-related feedback as well as interaction styles.

Machine learning techniques deep learning, specifically are the backbone of most affective computing applications. These models enable emotion recognition from various data sources, from physiological signals to text. Mai et al. (2021) developed a home-built EEG device to detect emotions and showed that neural signals were successfully classified through supervised learning methods. Similarly, Kratzwald et al. (2018) use deep learning architectures for text-based emotion recognition for decision support systems, revealing the potential of affect inference from language in intelligent systems. Their research reveals the ability of neural models to recognize implicit emotional cues embedded in learner communication.

In learning environments, affective computing extends beyond detection to the goal of enhancing human–computer interaction. Mutawa and Sruthi (2024) offer a machine learning model to predict learner emotion and satisfaction in online learning and show how adaptive responses to emotional difference can lead to more satisfactory learning. Shifa et al. (2025) emphasize the significance of emotionally intelligent AI systems in aiding student well-being, especially in emotionally sophisticated digital classroom environments. Their study points out that overlooking emotional signals in VLEs can lead to suboptimal participation and learning inefficiencies.

Yet, although these systems perform well in emotion detection, most use solely black-box neural frameworks that have limited explainability. The reason for this non-transparency lies in being a hindrance to educational implementation, wherein teachers and students might need to obtain intelligible explanations of system actions. Artanto and Arifin (2023) demonstrate how deep learning can detect gesture and emotional evaluations and recognize affective states but also admit that such systems tend to lack reasoning ability for explainable intervention.

The emerging neuro-symbolic AI research offers a promising solution to this challenge by merging the perceptual power of neural networks and explainable symbolic reasoning. Singh (2024) explores affective computing with neural networks and recommends hybrid systems that move beyond perception to pedagogically guided decision-making. Kumar (2023) further describes the use of neuro-symbolic approaches in adaptive mental health care, making analogies between cognitive modeling of psychiatry and affect-driven adaptation in learning technology. These remarks further affirm the importance of fusing cognitive science with computational logic for the enhancement of user-focused AI systems.

Lu et al. (2025) provide a concrete illustration of explainable neuro-symbolic integration in the health setting, where recommendations for diagnosis benefit from the neural accuracy as well as symbolic clarity. Although their work is in health care, the methodological extensions translate directly to education where explainable personalization from affective signals is equally crucial. This suggests that neuro-symbolic frameworks can potentially provide an actionable pathway toward the construction of intelligent tutoring systems that are affect-sensible and pedagogically understandable.

Recent research highlights the critical role of cognitive, motivational and affective processes in digital and AI-enhanced learning environments. Studies on self-regulated learning (SRL) show that learners’ ability to plan, monitor and adjust their learning strategies is enhanced when AI provides timely feedback and adaptive guidance (Wei, 2023; Lee et al., 2023). Meanwhile, motivation and engagement have been shown to influence persistence and performance in AI-supported learning, with factors such as self-efficacy, resilience and feedback accuracy shaping learners’ willingness to interact with digital systems (Shi and Zhang, 2025; Eisbach et al., 2023). From a cognitive load perspective, research indicates that AI systems must balance task complexity and learner support to avoid overload while promoting goal-setting and effective learning strategies (Zhang et al., 2026; Rind, 2026). Furthermore, the integration of emotional and social learning into AI-driven platforms enhances learners’ socio-emotional development, engagement and adaptability, particularly in primary education contexts (Ofori et al., 2024; Akintayo et al., 2024). Together, these studies emphasize that affect-aware, adaptive systems must simultaneously consider learners’ cognitive capacities, emotional states and motivational drivers to deliver truly personalized, effective learning experiences. This theoretical grounding directly informs the design of our neuro-symbolic framework, which integrates real-time affect detection with pedagogically grounded reasoning to dynamically adapt instructional strategies in VLEs.

Overall, the literature supports the viability of affective computing to enhance virtual learning but reveals two key limitations:

  1. current systems often lack effective affective responsiveness tuned to learners’ needs; and

  2. the black-box nature of deep learning models diminishes trust and pedagogical integrity.

Neuro-symbolic reasoning addresses both issues by intertwining affective perception with explainable, goal-directed adaptation. By dynamically mapping learners’ emotional states to pedagogically grounded interventions, the proposed framework enhances engagement, promotes self-regulated learning and improves learning outcomes, such as task persistence, knowledge retention and learner satisfaction. Moreover, ethical and practical considerations such as transparency, fairness and respect for learner privacy are central to the system design, ensuring that adaptive interventions are not only effective but also responsible and teacher-friendly. This work builds upon existing research by presenting a new neuro-symbolic approach to emotion-aware, real-time personalization within VLEs, with the goal of facilitating learner-focused, transparent and empathetic digital learning experiences.

This section outlines the design and implementation of the proposed neuro-symbolic framework for affective-aware personalization in VLEs. The system architecture integrates neural emotion recognition and symbolic reasoning mechanisms to tailor instructional content to learners’ affective states. The methodology encompasses the neural subsystem for affect detection, a symbolic reasoning engine for pedagogical decision-making, the fusion mechanism that unifies both components and the evaluation strategy within a VLE simulation.

The emotion detection module was trained using three benchmark multimodal data sets such as AffectNet (Mollahosseini et al., 2017), GoEmotions (Demszky et al., 2020) and IEMOCAP (Busso et al., 2008). These data sets were chosen for their coverage of visual, textual and auditory affective expressions, respectively.

AffectNet contains over 1 million facial images labeled across eight basic emotions. For this study, a subset of 100,000 samples was used. GoEmotions, developed by Google Research, provides 58,000 English Reddit comments annotated with 27 emotion categories. Multi-label instances were flattened by selecting the dominant label using confidence scores. IEMOCAP includes approximately 12 h of audio-visual data from 10 actors engaged in scripted and improvised scenarios, annotated across key emotions including happiness, anger, sadness and neutrality.

The data sets were partitioned into 70% training, 15% validation and 15% test splits, ensuring subject-level disjoint sets to prevent data leakage and overfitting. All data preprocessing was performed in compliance with data set licenses and ethical use standards. Also, a simulated symbolic knowledge base was designed to complement the neural emotion outputs with contextual reasoning. This rule base consisted of 142 expert-verified affective rules, curated by two domain experts in educational psychology and affective computing. Rules were encoded in a forward-chaining format. For example:

emotion(frustration),activity(difficulty_high) => suggestion(provide_hint)

emotion(boredom), activity(difficulty_low) => suggestion(increase_challenge)

emotion(confusion), repetition_count > 2 => suggestion(rephrase_instruction)

The symbolic rules were invoked during inference to augment the neural outputs with pedagogical intent. An adaptive fusion strategy was used to weigh symbolic and neural confidence distributions.

To ensure robustness across diverse learner profiles, we used a fairness-aware loss function, extending the standard cross-entropy loss with reweighing techniques inspired by Cui et al. (2019). Specifically, we used a class-balanced variant of Focal Loss defined as:

where C is the number of emotion classes, pi is the predicted probability, αi is an inverse frequency weight to counter class imbalance, and γ=2 is the focusing parameter.

The proposed architecture consists of four major components: an affective state recognition module, a learner model and symbolic knowledge base, a symbolic reasoning engine and a personalization module responsible for adaptive content delivery. As illustrated in Figure 1, the architecture accepts multimodal learner input (facial expression, textual interaction and vocal tone), processes this data through a neural emotion classification pipeline and passes the inferred emotional state to a rule-based reasoning engine. Based on the inference outcomes, the personalization module dynamically adjusts content modality, complexity or pacing.

Figure 1.
A flowchart shows multimodal emotion detection and personalisation system architecture.The flowchart shows a system architecture starting with an input module that receives multimodal input including face, text and voice. These inputs are processed through neural emotion detection components including face using ResNet 50, text using BERT and voice using wav2vec 2.0. The outputs are combined through multimodal fusion to form an affective state vector. This is passed to a symbolic reasoning engine using rule-based inference in Prolog. The result is then processed by a personalisation engine for adaptive learning delivery. A learner feedback loop based on emotion and interaction provides real time input back into the system.

Overall architecture of the proposed framework

Figure 1.
A flowchart shows multimodal emotion detection and personalisation system architecture.The flowchart shows a system architecture starting with an input module that receives multimodal input including face, text and voice. These inputs are processed through neural emotion detection components including face using ResNet 50, text using BERT and voice using wav2vec 2.0. The outputs are combined through multimodal fusion to form an affective state vector. This is passed to a symbolic reasoning engine using rule-based inference in Prolog. The result is then processed by a personalisation engine for adaptive learning delivery. A learner feedback loop based on emotion and interaction provides real time input back into the system.

Overall architecture of the proposed framework

Close modal

The methodological design of this framework is grounded in established educational theories. Constructs such as learner engagement and learning gain are interpreted through the lens of self-regulated learning (SRL), motivation and engagement theory and cognitive load theory. Engagement metrics reflect sustained attention and interaction, while learning gains correspond to the effectiveness of scaffolded instructional adaptation. The affective states detected by the neural subsystem guide interventions in alignment with these theories, ensuring that system decisions are pedagogically meaningful rather than purely data-driven.

3.2.1 Emotion detection module.

Emotion detection was achieved through a hybrid neural pipeline that processes facial, textual and auditory modalities. For facial expression analysis, the ResNet-50 convolutional neural network architecture was used, pre-trained on the AffectNet data set. Textual sentiment and emotional cues were processed using the Bidirectional Encoder Representations from Transformers (BERT) model fine-tuned on the GoEmotions data set. For speech-based affect detection, wav2vec2.0 was adopted, leveraging self-supervised learning on raw audio waveforms.

Each modality output is represented as a probability distribution over discrete emotional states ei{bored,confused,engaged,frustrated,happy,neutral}. These distributions are combined using a soft attention mechanism that computes a weighted affective state vector:

where ei is the emotion vector from modality i and αi is the attention weight learned during training.

3.2.2 Symbolic reasoning module.

The symbolic reasoning engine is designed to emulate the logic of a pedagogical expert. A forward-chaining rule base, implemented in Prolog, applies declarative rules to interpret the dominant emotional state and determine the appropriate instructional intervention. The discretized emotion vector is converted into a symbolic fact via maximum-likelihood selection:

The resulting fact (e.g. emotion(frustrated)) is passed into the symbolic engine, which searches the knowledge base K for applicable inference rules. A few illustrative rules are as follows:

emotion(frustrated):- suggest(help_video).

emotion(bored):- suggest(interactive_quiz).

emotion(engaged):- continue(current_path).

Let F denote the set of observed emotional facts and R the rule base. The symbolic system computes the pedagogical adaptationA by applying forward chaining:

To ensure symbolic outputs remain consistent with underlying affective states, a fusion controller maps the continuous emotion vector into discrete triggers only when confidence is high or entropy is low. Otherwise, symbolic reasoning is suppressed in favor of direct neural intervention.

The symbolic knowledge base was constructed through a collaborative expert-driven process. Two domain experts in educational psychology and affective computing curated 142 pedagogically relevant rules linking learner affective states to instructional interventions. Each rule was verified for internal consistency and relevance to learning objectives. For example:

emotion(frustration), activity(difficulty_high) => suggestion(provide_hint)

emotion(boredom), activity(difficulty_low) => suggestion(increase_challenge)

emotion(confusion), repetition_count > 2 => suggestion(rephrase_instruction)

The rule base was made to go through iterative validation, including peer review by additional educators and simulation testing within the Moodle VLE to ensure that proposed adaptations aligned with pedagogical expectations and learner engagement principles. Rules were applied using forward chaining in the symbolic engine, and adaptations were triggered only when neural confidence and entropy thresholds indicated sufficient certainty.

3.2.3 Fusion mechanism.

The fusion between the neural and symbolic components is implemented as a decision-level, confidence-aware control mechanism, rather than a static integration strategy. Specifically, the system uses a fusion controller that dynamically determines whether symbolic reasoning should be activated based on the confidence distribution of the multimodal affective predictions. Confidence is estimated using the maximum predicted probability and entropy of the emotion vector. When the predicted affective state exhibits high confidence (i.e. low entropy), the dominant emotion is mapped into a symbolic fact and passed to the reasoning engine for pedagogical inference. Conversely, in cases of uncertainty, the system suppresses or attenuates symbolic reasoning to avoid unreliable rule activation, relying instead on direct neural adaptation. This design ensures that symbolic reasoning is invoked only when it is both meaningful and reliable, thereby improving interpretability without compromising robustness.

This mapping is achieved using a thresholding mechanism defined by:

Once the dominant emotion is inferred, it is encoded as a fact and passed into the symbolic engine for reasoning. This fusion approach maintains interpretability while leveraging neural precision. In cases of ambiguity, where predictions are uncertain, symbolic reasoning is either deferred or adjusted with fallback strategies.

This dynamic fusion strategy distinguishes the proposed framework from conventional neuro-symbolic systems, where neural and symbolic components are often combined in a static or sequential manner. By introducing conditional symbolic activation, the model achieves a more flexible and context-aware integration, enabling more effective translation of affective signals into pedagogically relevant actions.

3.2.4 Personalization engine.

The personalization engine applies symbolic recommendations in ways that are explicitly informed by learning theory. For example, interventions designed to reduce frustration or confusion aim to lower extraneous cognitive load and maintain motivation, consistent with cognitive load and self-regulated learning principles. Conversely, increasing challenge during boredom is intended to sustain engagement and promote deeper cognitive processing. These theory-grounded strategies provide a rationale for the mapping from affective states to instructional adaptations.

Let Lc denote the current learning content and A the recommended adaptation strategy. The transformed content Lc is computed via:

where Φ is a content transformation function parameterized by symbolic adaptation rules. For example, if the detected emotion is boredom, Φ may replace static text explanations with an interactive animation or quiz to increase engagement.

The effectiveness of the proposed system was assessed through a controlled experiment involving 80 participants, stratified into three groups: a control group (n = 26) with no emotion-based adaptation, a neural-only group (n = 27) using only neural inferences and a neuro-symbolic group (n = 27) using the full hybrid framework. All participants engaged with identical content under the same VLE conditions. The only variable was the presence and nature of real-time instructional adaptation informed by affective signals.

The system was evaluated using a controlled experimental design to ensure that differences in engagement and learning outcomes could be attributed to the adaptive interventions rather than extraneous factors. A total of 80 participants were recruited, stratified across educational backgrounds, age and technical familiarity to ensure diversity and minimize confounding variables. Participants were randomly assigned to one of three experimental conditions:

  1. Control (no adaptation);

  2. Neural-Only Adaptation; and

  3. Neuro-Symbolic Adaptation.

Each participant completed all learning modules under their assigned condition in a single session within the Moodle-based VLE. While the study was short-term, this design allowed rigorous comparison aligned with the research questions linking affective adaptation → engagement → learning gain.

The Moodle-based VLE was structured into three instructional modules, comprising reading comprehension exercises, interactive quizzes and scenario-based simulations. Detailed interaction logs captured time-on-task, click activity and dropout events, providing quantitative measures of engagement. The neuro-symbolic agent dynamically adapted content, pacing and modality based on detected affective states, guided by the symbolic rule base validated by domain experts. The neural-only adaptation group received interventions solely from the neural models, while the control group experienced static, non-adaptive content.

Data for multimodal emotion detection was sourced from benchmark data sets (AffectNet, GoEmotions, IEMOCAP), while real-time learner interactions were collected within the VLE. The symbolic reasoning engine was implemented using SWI-Prolog, and neural modules were developed in PyTorch, leveraging HuggingFace transformers and torchvision libraries.

Performance was evaluated along three dimensions such as accuracy of affective state prediction, learner engagement and academic performance. Emotion detection accuracy was measured using macro-averaged F1 score across emotional classes. Engagement was quantified through time-on-task, click-through rates and dropout incidence. Learning gain was measured as the normalized difference between pre-test and post-test scores, defined as:

where Spre and Spost are the learner’s pre- and post-test scores respectively.

Results from the neuro-symbolic configuration were compared against both the control and neural-only conditions. Statistical significance was assessed using repeated-measures analysis of variance (ANOVA) at α=0.05.

This section presents a comprehensive evaluation of the proposed neuro-symbolic framework, with results organized to assess the performance of each component emotion detection, symbolic reasoning and personalization as well as their collective effect on learner engagement and academic performance. Additional analysis includes interpretability, model calibration and ablation studies to quantify the contributions of key subsystems.

The emotion detection module was evaluated using a multimodal test set derived from the DEAP data set, supplemented with interaction logs from the experimental Moodle-based VLE. Performance metrics were calculated per emotion category using macro-averaged F1-scores, presented in Table 1 alongside class-wise support values to contextualize the results.

Table 1.

Emotion classification performance by modality (F1-score with support)

Emotion labelSupportText (BERT)Audio (wav2vec2)Face (ResNet50)Multimodal fusion
Bored3400.780.740.760.83
Confused3100.720.690.700.77
Engaged3800.810.760.790.85
Frustrated2950.730.700.710.79
Happy4000.850.800.830.88
Neutral3500.760.720.750.81
Average0.770.730.760.82

Multimodal fusion consistently outperformed unimodal models across all categories, achieving a 5–6% improvement in average F1-score over the strongest individual modality (text). This highlights the efficacy of the attention-based fusion in amplifying relevant emotional cues and mitigating noisy signals. The classification distribution is visualized in Figure 2.

Figure 2.
A grouped bar chart shows F 1 score comparison across modalities for different emotional states.The chart shows F 1 score values for bored, confused, engaged, frustrated, happy and neutral across four methods including text BERT, audio wav2vec 2, face ResNet 50 and multimodal fusion. For bored, scores are about 0.78 for text, 0.74 for audio, 0.76 for face and 0.83 for fusion. For confused, values are about 0.72, 0.69, 0.70 and 0.77. For engaged, values are about 0.81, 0.76, 0.79 and 0.85. For frustrated, values are about 0.73, 0.70, 0.71 and 0.79. For happy, values are about 0.85, 0.80, 0.83 and 0.88. For neutral, values are about 0.76, 0.72, 0.75 and 0.81. Multimodal fusion shows the highest scores across all emotional states.

Emotion classification by modality

Figure 2.
A grouped bar chart shows F 1 score comparison across modalities for different emotional states.The chart shows F 1 score values for bored, confused, engaged, frustrated, happy and neutral across four methods including text BERT, audio wav2vec 2, face ResNet 50 and multimodal fusion. For bored, scores are about 0.78 for text, 0.74 for audio, 0.76 for face and 0.83 for fusion. For confused, values are about 0.72, 0.69, 0.70 and 0.77. For engaged, values are about 0.81, 0.76, 0.79 and 0.85. For frustrated, values are about 0.73, 0.70, 0.71 and 0.79. For happy, values are about 0.85, 0.80, 0.83 and 0.88. For neutral, values are about 0.76, 0.72, 0.75 and 0.81. Multimodal fusion shows the highest scores across all emotional states.

Emotion classification by modality

Close modal

The symbolic reasoning engine achieved 97.4% alignment with expert-defined emotion-action rules across 500 test cases. To assess the impact of personalization strategies on learner experience, a user study was conducted involving three experimental groups. Learner engagement metrics were derived from interaction logs and normalized for session duration. Table 2 reports the average values, standard deviations (SD) and p-values from ANOVA comparisons.

Table 2.

Learner engagement and retention metrics (mean ± SD)

GroupTime-on-task (min)Click activity (/session)Dropout rate (%)
Control19.4 ± 3.142.6 ± 5.421.8
Neural-Only adaptation24.8 ± 4.5 (p = 0.012)58.1 ± 6.2 (p = 0.008)13.2
Neuro-Symbolic31.6 ± 4.7 (p < 0.001)71.3 ± 7.3 (p < 0.001)5.5

The neuro-symbolic group exhibited significantly higher engagement and reduced dropout rates. ANOVA results yielded F(2,77) = 8.74, p < 0.001 and a partial eta-squared of 0.34, indicating a large effect size. These results are visualized in Figure 3.

Figure 3.
A combined bar and line chart shows time on task and dropout rate across three groups.The chart shows average time on task in minutes and dropout rate in per cent for control, neural only and neuro symbolic groups. Time on task increases from about 19 minutes for control to about 25 minutes for neural only and about 32 minutes for neuro symbolic. Dropout rate decreases from about 22 per cent for control to about 14 per cent for neural only and about 5 per cent for neuro symbolic. The results show increasing engagement with decreasing dropout across the groups.

Engagement and retention metrics

Figure 3.
A combined bar and line chart shows time on task and dropout rate across three groups.The chart shows average time on task in minutes and dropout rate in per cent for control, neural only and neuro symbolic groups. Time on task increases from about 19 minutes for control to about 25 minutes for neural only and about 32 minutes for neuro symbolic. Dropout rate decreases from about 22 per cent for control to about 14 per cent for neural only and about 5 per cent for neuro symbolic. The results show increasing engagement with decreasing dropout across the groups.

Engagement and retention metrics

Close modal

Beyond the observed numerical improvements, these findings can be interpreted through the lens of self-regulated learning and motivation theory. Increased time-on-task and click activity suggest heightened learner autonomy and sustained attention, likely facilitated by affect-aware adaptations that respond to frustration, boredom or confusion. Reduced dropout rates indicate that the system effectively maintained motivation and engagement by dynamically adjusting instructional content, supporting the theoretical premise that emotional scaffolding enhances learning persistence.

Normalized learning gain was calculated as the difference between pre- and post-test scores, adjusted for maximum possible improvement. Table 3 includes the means, standard deviations and p-values.

Table 3.

Learning gains across experimental groups

GroupPre-Test score (mean ± SD)Post-Test score (mean ± SD)Normalized gain (%)p-value
Control0.42 ± 0.090.61 ± 0.1133.0
Neural-Only adaptation0.44 ± 0.080.70 ± 0.1046.00.019
Neuro-Symbolic0.43 ± 0.070.79 ± 0.0963.0<0.001

The neuro-symbolic group significantly outperformed the others. ANOVA confirmed this with F(2,77) = 12.3, p < 0.001, η2 = 0.39. These gains validate the hypothesis that affect-aware, rule-informed personalization improves learning outcomes.

The significant improvements in learning gain observed for the neuro-symbolic group can be understood in terms of adaptive scaffolding and cognitive load management. By tailoring instructional interventions to the learner’s emotional state, the system reduces extraneous cognitive load during frustration or confusion and provides optimal challenge during low-engagement periods. This supports deeper cognitive processing, consistent with principles of self-regulated learning and effective instructional design. These theory-driven interpretations explain why the neuro-symbolic framework outperforms neural-only and control conditions, beyond purely technical superiority.

Qualitative feedback indicated that learners found the neuro-symbolic agent more transparent and responsive. Participants used phrases like “it adapted to how I felt” and “the switch to video helped when I was confused.” To visualize interpretability, SHAP (SHapley Additive exPlanations) analysis was applied to the BERT model. Figure 4 shows top contributing words for emotion predictions. High-weight tokens like “confused”, “stuck” and “amazing” aligned with expected emotional categories. If the figure is not included, this section should be rephrased or removed accordingly.

Figure 4.
A horizontal bar chart shows S H A P value impact for different input features.The chart shows SHAP value impact for features including lost, interesting, bored, help, amazing and confused. The confused feature has the highest value at about 0.35. Amazing follows at about 0.28. Help shows a value near 0.22. Bored has a value around 0.18. Interesting shows about 0.17, and Lost has the lowest value near 0.16. The values indicate the relative contribution of each feature to the model output.

SHAP feature importance for Bert emotion classifier

Figure 4.
A horizontal bar chart shows S H A P value impact for different input features.The chart shows SHAP value impact for features including lost, interesting, bored, help, amazing and confused. The confused feature has the highest value at about 0.35. Amazing follows at about 0.28. Help shows a value near 0.22. Bored has a value around 0.18. Interesting shows about 0.17, and Lost has the lowest value near 0.16. The values indicate the relative contribution of each feature to the model output.

SHAP feature importance for Bert emotion classifier

Close modal

The neural modules (BERT, ResNet50, wav2vec2.0) converged smoothly with early stopping triggered between six and eight epochs. Figure 5 shows the training vs validation loss for BERT, with no signs of overfitting.

Figure 5.
A line plot shows training and validation loss over epochs.The plot shows training loss and validation loss across ten epochs. Training loss decreases steadily from about 0.52 at epoch 1 to around 0.20 at epoch 6, after which it stabilises slightly above 0.20. Validation loss decreases from about 0.58 at epoch 1 to around 0.25 at epoch 6, then increases slightly and plateaus near 0.27 from epochs 7 to 10. The trend indicates effective learning in early epochs, followed by mild overfitting as validation loss stops improving while training loss continues to stabilise.

Training and validation loss for BERT emotion classifier

Figure 5.
A line plot shows training and validation loss over epochs.The plot shows training loss and validation loss across ten epochs. Training loss decreases steadily from about 0.52 at epoch 1 to around 0.20 at epoch 6, after which it stabilises slightly above 0.20. Validation loss decreases from about 0.58 at epoch 1 to around 0.25 at epoch 6, then increases slightly and plateaus near 0.27 from epochs 7 to 10. The trend indicates effective learning in early epochs, followed by mild overfitting as validation loss stops improving while training loss continues to stabilise.

Training and validation loss for BERT emotion classifier

Close modal

Confidence calibration was assessed via reliability diagrams (Figure 6), showing a close alignment between predicted probabilities and actual correctness, particularly in the 0.5–0.9 range. Minor overconfidence was noted above 0.9, but overall calibration was acceptable.

Figure 6.
A calibration plot compares model accuracy with confidence against a perfect calibration line.The plot shows model accuracy as a function of prediction confidence, alongside a dashed line representing perfect calibration. Accuracy increases from about 0.12 at 0.1 confidence to about 0.86 at 0.9 confidence. The model curve closely follows the perfect calibration line, with slight deviations at lower and higher confidence levels. This indicates that the model is generally well calibrated, with predicted probabilities aligning closely with observed accuracy.

Reliability diagram for emotion prediction confidence

Figure 6.
A calibration plot compares model accuracy with confidence against a perfect calibration line.The plot shows model accuracy as a function of prediction confidence, alongside a dashed line representing perfect calibration. Accuracy increases from about 0.12 at 0.1 confidence to about 0.86 at 0.9 confidence. The model curve closely follows the perfect calibration line, with slight deviations at lower and higher confidence levels. This indicates that the model is generally well calibrated, with predicted probabilities aligning closely with observed accuracy.

Reliability diagram for emotion prediction confidence

Close modal

To better understand how each part of the system contributes to overall performance, an ablation study was carried out. This involved testing the system by removing or modifying specific components, such as different emotion sensing methods and the symbolic reasoning module. The goal was to see how these changes affected learner engagement and learning outcomes, rather than only looking at the final performance of the complete system.

Table 4 shows the results obtained when key components were disabled one at a time while keeping all other conditions the same. Engagement and learning gain were compared with the full neuro-symbolic system to make the differences easier to understand.

Table 4.

Component-Level Ablation and symbolic rule depth effects

Configuration/rule depthEngagement (%)Learning gain (%)Accuracy (%)Dropout rate (%)
Full model (baseline)100.0100.091.84.3
Without symbolic reasoning83.088.091.810.2
Facial-Only emotion detection92.696.285.46.7
Auditory-Only detection91.895.883.97.1
Rule Depth = 191.292.591.86.9
Rule Depth = 295.496.891.85.8
Rule Depth = 398.398.991.84.7
Rule Depth = 4100.0100.091.84.3
Rule Depth = 5Capped at 100Capped at 10091.84.3

Percentages above 100% were capped for interpretability. When the symbolic reasoning module was removed, there was a clear drop in learner engagement and improvement in retention. This suggests that structured decision-making plays an important role in turning emotional signals into useful teaching actions. On the other hand, using only a single emotion detection method led to a moderate decrease in performance. This indicates that combining multiple sensing methods makes the system more reliable, although it is not enough on its own to achieve the best learning support.

Beyond performance improvements, the results highlight the conceptual contribution of the proposed framework as a decision-level neuro-symbolic system. The observed gains can be attributed not only to multimodal affect detection but also to the conditional activation of symbolic reasoning, which enables more interpretable and pedagogically meaningful adaptations. This demonstrates the advantage of moving from purely data-driven personalization toward hybrid reasoning models that explicitly incorporate structured decision logic.

The findings of this study have several practical implications for educators and instructional designers. The neuro-symbolic framework can be integrated into VLEs to provide real-time, affect-aware personalization, supporting learners through adaptive content delivery, pacing adjustments and challenge modulation. For instance, learners exhibiting signs of frustration may receive additional hints or explanatory videos, while learners showing boredom could be guided toward interactive quizzes or more challenging activities. Instructors can supervise these adaptations, ensuring alignment with curriculum goals and learner needs, while also contextualizing interventions based on class-level observations.

Scalability considerations include expanding the symbolic rule base to accommodate diverse learner profiles and integrating the system with existing educational platforms. Automated rule induction and modular design strategies can facilitate broader deployment without overburdening technical maintenance.

Ethical considerations are essential when implementing affect-aware systems. This includes obtaining informed consent, ensuring privacy and confidentiality of emotion data and responsibly using affective information to enhance learning rather than manipulate or penalize learners. Transparent reporting and explainable system behavior are critical to maintaining trust and promoting responsible educational technology practices.

This study presented a new neuro-symbolic reasoning paradigm for affect-aware personalization in virtual learning systems to address the need of adaptive learning systems as being emotionally intelligent. By combining an emotion recognition subsystem powered by a deep neural network comprising BERT for text input, ResNet50 for visual signals and wav2vec 2.0 for voice signals with a symbolic reasoning component simulated via Prolog-based inference over an affective learner ontology, the given system demonstrated significant improvements in learner engagement, retention and performance.

Empirical evaluation validated that the hybrid model consistently performed well across several measures. The neuro-symbolic architecture outperformed baseline models, including a neural-only adaptation model and a control system without affective personalization. Specifically, the hybrid method provided an overall boost in engagement of 23.6% and learning gain of 19.8% over the control, as indicated by Table 2. Symbolic reasoning was also shown to play a critical role; ablation tests revealed its removal led to a statistically significant reduction in engagement by 17% and learning outcome by 12% (p < 0.01), highlighting the explanatory and adaptive capability of structured reasoning over affective states.

Additionally, SHAP-based interpretability analysis confirmed that symbolic rules especially those derived from prolonged negative affect or conflict of affects were integral to adaptive choice-making and thus trust and explainable model forecasting. Training dynamics analysis guaranteed stability of convergence as well as effectiveness of generalization, and further experimentation on depth of rules revealed a gains saturation point for personalization, demonstrating best complexity threshold values for symbolic rules.

While these encouraging results, several limitations need to be pursued. Firstly, the rule-based engine itself, while helpful, rests on pre-specified expert rules, which will limit scalability to diverse learning settings or evolving learner behavior. Neuro-symbolic program synthesis or reinforcement learning for rule adaptation is a promising direction in the future. Second, even though the model currently recognizes face and voice cues, inclusion of physiological cues such as galvanic skin response or heart rate variability can be more holistic affective modeling. Finally, extension of personalization to include culturally situational emotional norms or learning styles might make the system more inclusive.

Future research should explore both technical and educational dimensions. On the technical side, this includes automated symbolic rule induction, reinforcement learning for dynamic adaptation and integration of additional affective modalities such as physiological signals. From an educational perspective, longitudinal studies are needed to evaluate the long-term impact on learner engagement, knowledge retention and motivation across diverse learning environments. Investigations into cross-cultural generalizability and inclusive design will help ensure that adaptive interventions are effective for learners with varied socio-emotional and cultural backgrounds.

The author would like to express special thanks to all the contributors of this paper.

This research received no external funding.

Akintayo
,
O.T.
,
Eden
,
C.A.
,
Ayeni
,
O.O.
and
Onyebuchi
,
N.C.
(
2024
), “
Integrating AI with emotional and social learning in primary education: developing a holistic adaptive learning ecosystem
”,
Computer Science and IT Research Journal
, Vol.
5
No.
5
, pp.
1076
-
1089
.
Artanto
,
H.
and
Arifin
,
F.
(
2023
), “
Emotions and gesture recognition using affective computing assessment with deep learning
”,
IAES International Journal of Artificial Intelligence (IJ-AI)
, Vol.
12
No.
3
, pp.
1419
-
1427
.
Berezi
,
I.U.
(
2025
), “
Virtual learning environment: redefining higher educational delivery for efficiency and accessibility
”,
International Journal of Educational Management, Rivers State University
, Vol.
1
No.
1
, pp.
451
-
467
.
Busso
,
C.
and
Bulut
,
M.
, et al (
2008
), “
IEMOCAP: Interactive emotional dyadic motion capture database
”, Language Resources and Evaluation.
Cui
,
Y.
,
Jia
,
M.
,
Lin
,
T.-Y.
,
Song
,
Y.
and
Belongie
,
S.
(
2019
),
Class-Balanced Loss Based on Effective Number of Samples
,
CVPR
, doi: .
Demszky
,
D.
and
Movshovitz-Attias
,
D.
et al. (
2020
), “
GoEmotions: a dataset of fine-grained emotions
”,
arXiv preprint
.
Eisbach
,
S.
,
Langer
,
M.
and
Hertel
,
G.
(
2023
), “
Optimizing human-AI collaboration: Effects of motivation and accuracy information in AI-supported decision-making
”,
Computers in Human Behavior: Artificial Humans
, Vol.
1
No.
2
, p.
100015
.
Engelbrecht
,
J.
and
Oates
,
G.
(
2022
), “Student collaboration in blending digital technology in the learning of mathematics”,
Handbook of Cognitive Mathematics
,
Springer International Publishing
,
Cham
, pp.
1
-
39
.
Evain
,
S.
,
Nguyen
,
M.H.
,
Le
,
H.
,
Boito
,
M.Z.
,
Mdhaffar
,
S.
,
Alisamir
,
S.
, … and
Besacier
,
L.
(
2021
),
Task agnostic and task specific self-supervised learning from speech with lebenchmark.
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021).
Filatro
,
A.
and
Cavalcanti
,
C.C.
(
2024
),
Technology-Enabled Learning and Design Methodologies: Lessons from Creative, Agile, Immersive, and Analytical Advancements
,
Taylor and Francis
.
Gil
,
Y.
and
Selman
,
B.
(
2019
), “
A 20-year community roadmap for artificial intelligence research in the US
”,
arXiv preprint
.
Ismail
,
I.A.
and
Aljabr
,
F.S.
(
2025
), “The interplay of emotion and cognition in adult learning: learning engages emotion alongside cognition”,
Utilizing Emotional Experience for Best Learning Design Practices
, pp.
17
-
44
.
Johnson
,
N.
,
Veletsianos
,
G.
and
Seaman
,
J.
(
2020
), “
US faculty and administrators’ experiences and approaches in the early weeks of the COVID-19 pandemic
”,
Online Learning
, Vol.
24
No.
2
, pp.
6
-
21
.
Karakose
,
T.
(
2021
), “
The impact of the COVID-19 epidemic on higher education: opportunities and implications for policy and practice
”,
Educational Process: International Journal (EDUPIJ)
, Vol.
10
No.
1
, pp.
7
-
12
.
Kratzwald
,
B.
,
Ilić
,
S.
,
Kraus
,
M.
,
Feuerriegel
,
S.
and
Prendinger
,
H.
(
2018
), “
Deep learning for affective computing: text-based emotion recognition in decision support
”,
Decision Support Systems
, Vol.
115
, pp.
24
-
35
.
Kumar
,
A.
(
2023
), “
Neuro symbolic AI in personalized mental health therapy: bridging cognitive science and computational psychiatry
”,
World J Adv Res Rev
, Vol.
19
No.
2
, pp.
1663
-
1679
.
Lantz-Wagner
,
S.
(
2022
),
Paths to Pathways: Exploring Lived Experiences of International Students to andThrough Third-Party Pathway Programs
,
University of Dayton
.
Lee
,
M.
,
Lee
,
S.Y.
,
Kim
,
J.E.
and
Lee
,
H.J.
(
2023
), “
Domain-specific self-regulated learning interventions for elementary school students
”,
Learning and Instruction
, Vol.
88
, p.
101810
.
Liang
,
B.
,
Wang
,
Y.
and
Tong
,
C.
(
2025
), “
AI reasoning in deep learning era: from symbolic AI to neural–symbolic AI
”,
Mathematics
, Vol.
13
No.
11
, p.
1707
.
Liu
,
X.
and
Ardakani
,
S.P.
(
2022
), “
A machine learning enabled affective E-learning system model
”,
Education and Information Technologies
, Vol.
27
No.
7
, pp.
9913
-
9934
.
Lu
,
Q.
,
Li
,
R.
,
Sagheb
,
E.
,
Wen
,
A.
,
Wang
,
J.
,
Wang
,
L.
and
Liu
,
H.
(
2025
), “
Explainable diagnosis prediction through neuro-symbolic integration
”.
AMIA Summits on Translational Science Proceedings
, p.
332
.
Mai
,
N.D.
,
Lee
,
B.G.
and
Chung
,
W.Y.
(
2021
), “
Affective computing on machine learning-based emotion recognition using a self-made EEG device
”,
Sensors
, Vol.
21
No.
15
, pp.
5135
.
Mileo
,
A.
(
2025
), “
Towards a neuro-symbolic cycle for human-centered explainability
”,
Neurosymbolic Artificial Intelligence
, Vol.
1
, p.
240740
.
Mollahosseini
,
A.
,
Hasani
,
B.
and
Mahoor
,
M.H.
(
2017
), “
AffectNet: a database for facial expression, valence, and arousal computing in the wild
”,
IEEE Transactions on Affective Computing
, Vol.
10
No.
1
.
Mutawa
,
A.M.
and
Sruthi
,
S.
(
2024
), “
Enhancing human–computer interaction in online education: a machine learning approach to predicting student emotion and satisfaction
”,
International Journal of Human–Computer Interaction
, Vol.
40
No.
24
, pp.
8827
-
8843
.
Ofori
,
S.D.
,
Olateju
,
M.
,
Frempong
,
D.
and
Ifenatuora
,
G.P.
(
2024
), “
Integrating AI with emotional and social learning in primary education: developing a holistic adaptive learning ecosystem
”.
Rind
,
I.A.
(
2026
), “
Conceptualizing the impact of AI on teacher knowledge and expertise: a cognitive load perspective
”,
Education Sciences
, Vol.
16
No.
1
, p.
57
.
Shi
,
S.
and
Zhang
,
H.
(
2025
), “
EFL students’ motivation predicted by their self-efficacy and resilience in artificial intelligence (AI)-based context: from a self-determination theory perspective
”,
Learning and Motivation
, Vol.
91
, p.
102151
.
Shifa
,
S.
,
Hameed
,
S.
,
Suhail
,
H.
and
Kiren
,
A.
(
2025
), “
Emotional intelligence meets artificial intelligence affective computing and student Well-Being online classrooms
”,
The Critical Review of Social Sciences Studies
, Vol.
3
No.
3
, pp.
271
-
289
.
Singh
,
A.
(
2024
), “
Neural Network-Based affective computing for education. SSRN 5194180
”.
Tilak
,
J.B.
and
Kumar
,
A.G.
(
2022
), “
Policy changes in global higher education: what lessons do we learn from the COVID-19 pandemic?
”,
Higher Education Policy
, Vol.
35
No.
3
, p.
610
.
Vistorte
,
A.O.R.
,
Deroncele-Acosta
,
A.
,
Ayala
,
J.L.M.
,
Barrasa
,
A.
,
López-Granero
,
C.
and
Martí-González
,
M.
(
2024
), “
Integrating artificial intelligence to assess emotions in learning environments: a systematic literature review
”,
Frontiers in Psychology
, Vol.
15
, p.
1387089
.
Wei
,
L.
(
2023
), “
Artificial intelligence in language instruction: impact on English learning achievement, L2 motivation, and self-regulated learning
”,
Frontiers in Psychology
, Vol.
14
, p.
1261955
.
Zhang
,
J.
,
Guo
,
Q.
,
Hu
,
R.
,
Zhou
,
Y.
and
Hu
,
Z.
(
2026
), “
Linking emotional intelligence and AI-enhanced learning to goal-setting behavior: a cognitive load theory framework
”,
Learning and Motivation
, Vol.
93
, p.
102216
.
Alharbi
,
L.
(
2021
).
A Conceptual Framework for Monitoring the Emotional State of Students in VLEs.
The University of Liverpool (United Kingdom)
.
Chitrakar
,
N.
and
Nisanth
,
P.M.
(
2023
), “
Frustration and its influences on student motivation and academic performance
”,
International Journal of Scientific Research in Modern Science and Technology
, Vol.
2
No.
11
, pp.
1
-
9
.
Lin
,
C.C.
,
Huang
,
A.Y.
and
Lu
,
O.H.
(
2023
), “
Artificial intelligence in intelligent tutoring systems toward sustainable education: a systematic review
”,
Smart Learning Environments
, Vol.
10
No.
1
, p.
41
.
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at Link to the terms of the CC BY 4.0 licenceLink to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal