This study examines the role of voice-based (vs. text-based) educational robots in shaping students’ engagement. By focusing on the cognitive, emotional and behavioral dimensions of student engagement, it aims to provide insights into how different interaction styles influence the educational experience.
A mixed-methods approach was adopted, comprising two studies. Study 1 employed qualitative methods, including student diaries and group discussions, to identify the dimensions of students’ engagement. Deploying a structured questionnaire, Study 2 conducted quantitative analysis, including paired t-tests and Cohen’s d, to compare student engagement outcomes between the studied voice-based and text-based robots.
The results highlight how voice- and text-based educational robots shape student engagement differentially, identifying key hallmarks of students’ cognitive, emotional and behavioral engagement. The comparative analysis reveals that the voice-based robot is particularly effective in enhancing emotional and cognitive engagement, improving concentration, motivation and emotional connection through multisensory and personalized interactions. In contrast, the text-based robot excels in supporting autonomous learning by facilitating content review. Therefore, these robots promote behavioral engagement through different mechanisms, revealing their role as complementary tools to create students’ holistic educational experience.
This study provides novel insights into the role of voice-based (vs text-based) educational robots in shaping students’ engagement. By combining qualitative and quantitative approaches, our analyses offer novel academic and practical implications for the adoption of educational robots in shaping student engagement.
1. Introduction
Rapid technological advancements are reshaping service sectors (De Keyser et al., 2019; Mele and Russo-Spena, 2025), including education (Fisk et al., 2022). Here, artificially intelligent educational tools (e.g. robots) are increasingly deployed to provide services more efficiently and/or more effectively (Timotheou et al., 2023), facilitating students’ learning journeys. For example, artificially intelligent educational tools may offer personalized guidance to learners, allow them to work at their own level and pace (e.g. through interactive learning platforms that offer real-time feedback), and/or by automating tasks like grading and resource management (Mele et al., 2022; Onesi-Ozigagun et al., 2024). Correspondingly, private and public educational establishments are increasingly leveraging artificial intelligence (AI) to enhance students’ educational experiences.
Among these innovations, AI-driven educational robots or robots designed to facilitate learning and teaching through the integration of AI technologies (Chiu et al., 2023), stand out as emblematic of the observed digital transformation in the sector. These robots are designed to simulate communication and replicate the nuanced, personalized delivery of key content that is traditionally offered by human teachers (Flavián et al., 2023; Selwyn, 2019). Educational robots come in different forms, including text- and voice-based robots (e.g. Im et al., 2023). While text-based robots are designed to interact with users primarily through written communication with text exchanges (Okonkwo and Ade-Ibijola, 2021), voice-based robots are AI-powered systems designed to interact with users through spoken language, with speech recognition and voice synthesis (Im et al., 2023). Though voice- and text-based robots may both facilitate students’ learning processes, they are expected to have different benefits to their users. For example, text-based robots are instrumental in advancing learners’ critical thinking and written communication and skills (Traxler and Gernsbacher, 1993), facilitating deeper reflection and learning (Barrett et al., 2021). This reflective nature makes them well-suited for supporting individual knowledge development.
Conversely, by taking their users’ spoken instructions, voice-based robots enable hands-free and more intuitive communication, particularly for users with diverse needs or in dynamic environments (Katsarou et al., 2023). By mimicking human conversational patterns and reducing barriers to participation, voice-based robots may enhance immediate engagement.
Given these differences, it is likely that voice- and text-based educational robots differentially influence students’ engagement—an essential construct encompassing cognitive, emotional, and behavioral dimensions (Hollebeek et al., 2014, 2019; Sim et al., 2018). However, despite their growing presence in educational environments, we still lack a comprehensive understanding of how these distinct modalities shape student engagement across its dimensions (Northey et al., 2015; Kahu, 2013). This knowledge gap limits educators’ ability to make informed decisions about which robot modality best fosters student engagement and learning outcomes (Mennens et al., 2025). Without such insight, there is a risk that these technologies, while well-intentioned, may fail to stimulate meaningful engagement, ultimately undermining their value. Addressing this issue, we explore how voice- and text-based educational robots shape students’ cognitive, emotional, and behavioral engagement, thus adding novel acumen to the service literature. Specifically, by comparing the dynamics and effects of voice- vs. text-based educational robots on students’ engagement, our analyses permit us to draw inferences about the performance of these respective technologies in the sector. The attained insight matters, given the currently limited understanding of the effectiveness of these technologies, warranting further investigation.
We conduct two empirical studies. First, Study 1 explores the following research question (RQ1): How do voice-based and text-based educational robots shape students’ cognitive, emotional and behavioral engagement? We endeavor to answer this question through qualitative exploration comprising an abductive coding approach. Next, Study 2 extends the first study by undertaking a quantitative, survey-based investigation that addresses RQ2: To what extent do voice- vs. text-based educational robots differ in their ability to enhance students’ cognitive, emotional, and behavioral engagement? In both studies, we adopt the following AI-based technologies: (1) a virtual robot that automates written conversations and interactions with students, and (2) a social robot that tutors students in specific subjects (e.g. math/reading) through voice-based interactions. By comparing these communication modalities, this research advances theoretical understanding of how different AI technologies shape the multifaceted student engagement concept.
The study highlights how voice- and text-based educational robots shape student engagement in different ways, identifying specific items for cognitive, emotional, and behavioral dimensions. Our comparative analysis shows that the voice-based robot is particularly effective in enhancing emotional and cognitive engagement, improving concentration, motivation, and emotional connection through multisensory and personalized interactions. The text-based robot, on the other hand, excels in supporting autonomous learning by facilitating content review and conceptual clarity. Both robots promote behavioral engagement but through different approaches: the text-based robot encourages self-directed study, while the voice-based robot stimulates active participation and collaboration. The findings suggest that these robots are complementary tools for creating a more holistic educational experience.
Moreover, our findings extend beyond the educational setting. While text-based robots are increasingly being adopted—evident in the widespread use of chatbots across e-commerce, healthcare, and public administration (Maar et al., 2023)—voice-based robots remain relatively limited in their deployment within service contexts. However, voice, as a service interface, holds the potential to play a distinctive role in smart service systems due to its inherently social, expressive, and relational nature (Mende et al., 2025). As our findings show, voice-based communication allows for dynamic, multisensory, and emotionally rich exchanges that mimic human conversation. This enables voice interfaces to transcend mere functional support, fostering deeper emotional connection, heightened attention, and real-time engagement, fostering a sense of companionship. These outcomes underscore the unique ability of voice to humanize AI interactions, thereby offering service systems a relational conduit that engages users on both cognitive and emotional levels—benefits that are less readily achieved through text alone.
The article proceeds as follows: Section 2 reviews key literature underpinning the study. Sections 3 and 4 present the research design and main findings of Study 1 and Study 2, respectively. Section 5 offers a discussion of the results and outlines the major theoretical and practical implications derived from the research.
2. Theoretical background
2.1 Student engagement
The engagement concept has gained prominence in the marketing literature over the last fifteen years (Brodie et al., 2011; Hollebeek et al., 2022), including in the marketing education discourse (e.g. Díaz-Méndez and Gummesson, 2012; Huang et al., 2022). In this context, user engagement, defined as a user’s cognitive, emotional, and behavioral investment in specific interactions (Hollebeek, 2011), provides a useful lens to understand students’ active participation in educational experiences. However, while its prominence continues to grow, debate regarding the conceptualization of student engagement persists. Despite this dissent, we identify the following areas of key agreement across authors in this topic area.
First, student engagement is generally thought to reflect a student’s investment of their cognitive, emotional, and behavioral resources in their own learning journey (Northey et al., 2015; Parsons and Taylor, 2011). The more engaged students are with their learning, the more of their personal resources they will typically invest in it (Sim et al., 2018; Huang et al., 2022). The literature suggests that more engaged students tend to attain superior performance outcomes, while also displaying heightened commitment and satisfaction (Bravo et al., 2016), highlighting the strategic importance of student engagement (Ngoc Hoi, 2023).
Second, student engagement is typically viewed as a multidimensional concept comprising cognitive, emotional, and behavioral facets (Nkomo et al., 2021; Wong and Liem, 2022), as shown in Table 1. Cognitive engagement has been referred to as a psychological state in which students invest their personal resources to deeply concentrate on the learning process (Rotgans and Schmidt, 2011), revealing the student’s interest in and willingness to expend the mental effort required to acquire specific knowledge and skills (Northey et al., 2015). Scholars suggest that cognitively engaged students are more likely to independently take on learning tasks than their less engaged counterparts (Chiu, 2021) and typically self-regulate their learning progress (Wong and Liem, 2022), allowing them to persist through challenges to achieve their objectives (Pohl, 2020).
Overview – cognitive, emotional and behavioral student engagement
| Dimension | Facets | Definition |
|---|---|---|
| Cognitive engagement |
| … can be characterized as a psychological state in which students put in a lot of effort to truly understand a topic and in which students persist studying over a long period of time (Rotgans and Schmidt, 2011 p. 465) … involves the level of effort, willingness, and cognitive resources students invest to acquire the required knowledge and skills (Northey et al., 2015, p. 172) … … reflects a student’s level of concentration and mental focus given to their education experience (Conduit et al., 2016, p. 232) |
| … defined as the extent to which students are willing and able to take on the learning task at hand (Chiu, 2021, p. 2) … describes students’ self-regulation and perceived relevance and value of school and learning in relation to their goals and aspirations (Wong and Liem, 2022, p. 115) … refers to students’, goal setting, perception of relevance of learning, effort directed toward learning, and use of self-managed learning strategies (Pohl, 2020, p. 253) | |
| … keen interest in delving into and comprehending tasks (Chiu, 2021, p. 2) … persistence of learning activities over time (Rotgans and Schmidt, 2011, p. 465) | |
| Emotional engagement |
| … refers to students’ affective reactions in the classroom, including interest, boredom, happiness, sadness, and anxiety (Fredricks et al., 2004, p. 63) … reveals the level of positive emotion toward a focal engagement object and hence how students feel about their education experience (Conduit et al., 2016, p. 232) |
| … consists of several mechanisms including motivation, commitment, and a sense of comfort and belonging (Northey et al., 2015, p. 172) … relates to positive reactions to the learning environment, peers and teachers, as well as their sense of belonging and interest (Bond and Bedenlier, 2019, p. 2) | |
| …. Including enjoyment, support, belonging and attitudes towards teachers, peers, learning and school in general (Pietarinen et al., 2014, p. 40) | |
| Behavioral engagement |
| … refers to the actions and practices that students direct toward school and learning (Wang et al., 2011, p. 466) … it includes positive conduct (e.g. attending class and completing schoolwork), involvement in learning and academic tasks (Wang et al., 2011, p. 466) |
| … to active, observable participation in learning activities as typified by exertion, time and persistence (Bråten et al., 2018, p. 682) … focuses on interactions for task achievement and has historically included measures of class participation, attendance, …., and task completion (Conduit et al., 2016, p. 232) … in social interaction to share knowledge and learn from more capable others (Ngoc Hoi, 2023, p. 9) | |
| … learning opportunities provide students with a degree of control over the learning process (Northey et al., 2015 p. 3) … take responsibility for working out what they need to know and where to find that knowledge (Montenegro, 2022, p. 137) |
| Dimension | Facets | Definition |
|---|---|---|
| Cognitive engagement | mental efforts willingness concentration | … can be characterized as a psychological state in which students put in a lot of effort to truly understand a topic and in which students persist studying over a long period of time ( |
self-regulation goal setting taking on the learning task perceived learning relevance | … defined as the extent to which students are willing and able to take on the learning task at hand ( | |
in-depth comprehension learning persistence | … keen interest in delving into and comprehending tasks ( | |
| Emotional engagement | positive emotion affective reaction | … refers to students’ affective reactions in the classroom, including interest, boredom, happiness, sadness, and anxiety ( |
motivation | … consists of several mechanisms including motivation, commitment, and a sense of comfort and belonging ( | |
commitment sense of belonging attitude toward teachers and peers | …. Including enjoyment, support, belonging and attitudes towards teachers, peers, learning and school in general ( | |
| Behavioral engagement | task achievement positive conduct | … refers to the actions and practices that students direct toward school and learning ( |
active participation involvement in different activities social interaction | … to active, observable participation in learning activities as typified by exertion, time and persistence ( | |
Learning process control taking responsibility | … learning opportunities provide students with a degree of control over the learning process ( |
Emotional engagement refers to a student’s investment of their emotive resources in their learning process (Hollebeek et al., 2019). The more of their emotional resources (e.g., enthusiasm, passion, boredom, or anxiety) students invest in their learning (Fredricks et al., 2004), the more intense their educational experience will tend to be. Scholars suggest that students who are genuinely excited about their learning activities tend to develop a sense of comfort and belonging related to their learning (Northey et al., 2015), as exemplified by their positive connections with their institution, peers, and teachers (Bond and Bedenlier, 2019), in turn facilitating their subjective well-being (Wong et al., 2024).
Behavioral engagement denotes a student’s investment of time, energy, and effort in their learning process (Northey et al., 2015; Hollebeek et al., 2014). For example, students exhibiting elevated behavioral engagement tend to attend class, spend time studying, complete their assignments, ask questions, discuss study materials or content with their peers, and actively participate in academic and extracurricular activities (Wang et al., 2011; Hospel et al., 2016).
As prior research has typically taken an aggregate perspective of student engagement (Montenegro, 2022), insight into the effects of specific interventions on its particular dimensions remain relatively few and far between (Ngoc Hoi, 2023), as therefore explored further in the context of voice- vs. text-based educational robots in this research. However, advancing understanding of the effects of specific technologies on users’ engagement is of value (Hollebeek et al., 2021), particularly in the context of smart technology (Nkomo et al., 2021).
2.2 Educational robots
Recent advances in robotics have opened up new horizons for education, allowing service robots—autonomous, adaptable interfaces that interact, communicate, and deliver service (Wirtz et al., 2018; Wu et al., 2021)—to increasingly play a transformative role in service delivery (Čaić et al., 2019; Fisk et al., 2022; Mennens et al., 2025). As a specialized subset of service robots, educational robots are specifically designed to facilitate teaching and learning, including by helping them acquire key knowledge or by boosting their meaningful social interactions (Belpaeme and Tanaka, 2021). Educational robots can thus be used to cater to individual learning needs, facilitate personalized educational experiences, manage teachers’ workload, and to address post-pandemic knowledge/learning gaps, among others, yielding their growing adoption globally (Mele et al., 2022; Mennens et al., 2025).
Educational robots are particularly apt at engaging students (e.g. through spoken/written interactions), fostering meaningful connections with learners (Flavián et al., 2023). Recent advances in AI equip these robots with features like facial and speech recognition and dialog models powered by speech synthesis, enabling seamless, responsive interactions (Belpaeme and Tanaka, 2021; Bartneck et al., 2024). Prior studies demonstrate that educational robots can elicit students’ positive behavioral change, including by raising their classroom participation, trust in, and engagement with their learning (Flavián et al., 2023; Onesi-Ozigagun et al., 2024).
Tangible robots that students physically interact with (e.g. through touch/voice interfaces) play a particularly important role in building or maintaining engagement. Prior human-robot interaction research highlights that the embodied form, multi-modal communication, and attributes of educational robots (e.g. their name, appearance, verbal, visual, and physical attributes) tend to enhance learners’ sustained attention and engagement (Klos et al., 2021). Students’ perceived autonomy and clarity around their interactions with these robots is also important (e.g. in terms of how their data will be stored/used; Mennens et al., 2025).
Despite significant advancements in robotics and AI (Steins et al., 2024; Kumar et al., 2025b), understanding of the effect of educational robots on student engagement remains limited, particularly in terms of the impact of different (e.g. voice-vs. -text-based) educational robots (Mende et al., 2025), warranting further investigation.
Recent human-machine engagement research in services, which focuses on students’ voluntary contribution of resources (e.g. time, skills, or knowledge) directed toward machines during interactions with other stakeholders (Kumar et al., 2025b), offers valuable insight into how robotic interactions influence human behavior. While cognitive drivers (e.g. concern) reflect students’ assessments of trust and reliability, emotional drivers like excitement underpin positive anticipatory behaviors and heightened engagement (Ahn and Shin, 2015).
Though these studies advance understanding of students’ overall engagement with specific machines, acumen of the effect of particular technologies on specific engagement dimensions remains scant (Hollebeek et al., 2021), meriting further research, particularly in the educational context. Specifically, prior research offers not only limited insight into the dynamics characterizing users’ responses and engagement with the voice-vs. text-based robots but has also tended to examine engagement as an aggregate construct (vs. assessing the impact of specific technology on its specific dimensions; Nkomo et al., 2021; Wong and Liem, 2022).
However, this distinction is critical, as text-based robots, like chatbots, and voice-based robots offer unique affordances that may elicit students’ cognitive, emotional, and behavioral engagement in different ways. For example, with their ability to simulate human conversation, voice-based robots are likely to foster stronger emotional connections and reduce communicational barriers, while text-based robots may offer greater textual precision or understanding. Based on this rationale, we conduct an empirical assessment of the effect of voice- (vs. text)-based educational robots on students’ cognitive, emotional, and behavioral engagement.
3. Research approach
To explore the RQs, we take a mixed-methods approach, allowing us to integrate and benefit from qualitative and quantitative approaches. This pluralistic approach is suited to address the RQs, which focus on students’ social interactions with voice- and text-based educational robots (Creswell and Creswell, 2005). First, we adopted a qualitative approach in Study 1, which is widely regarded as an effective approach, to explore the nuanced dimensions of student engagement and their respective themes (e.g. Saeed and Zyngier, 2012; Steen-Utheim and Foldnes, 2018). We then conducted Study 2, which comprised a quantitative analysis that builds on Study 1, in line with the proposed mixed-methods design framework (Tashakkori and Creswell, 2007). By translating the emergent themes from Study 1 into measurable variables, Study 2 provides robust complementary evidence (please refer to the Web Appendices for further detail about the research process).
4. Study 1. Exploring student engagement dimensions through qualitative analysis
Study 1 explored students’ cognitive, emotional, and behavioral engagement with text- and voice-based educational robots. We collected a sample of 179 undergraduate management students (90 males; 89 females), minimizing potential bias linked to generational or skill-based differences, enhancing the reliability and validity of the findings (Fusch and Ness, 2015).
Students engaged with identical lesson content delivered by the two robots: (1) a text-based virtual chatbot and (2) a physical, voice-interactive robot. The data were collected through post-lesson reflections conducted in multiple phases, beginning with individual diaries, and culminating in group discussions (Bolger et al., 2003), fostering a nuanced examination of the proposed engagement dimensions. The research team provided students with detailed open-ended questions to guide their reflections. This step was crucial for capturing immediate and unfiltered reactions to the learning sessions. Individual diaries captured students’ immediate thoughts and reflections, providing insights into personal and cognitive processes such as information processing, comprehension of topics, and time efficiency. These were further enriched through group and choral discussions (Cooper, 2014), facilitating comparative reflections and collective dialog about the cognitive, emotional, and behavioral aspects of engagement. The full procedure, detailing how students were engaged throughout these activities, is outlined in the Web Appendices.
Following the collection of individual diaries and group discussion transcripts, the research team employed NVivo to carry out a structured thematic analysis aimed at exploring patterns of student engagement across cognitive, emotional, and behavioral dimensions. The coding framework in NVivo was developed through an abductive process, characterized by iterative movement between empirical data and theoretical constructs. Drawing on key literature in the field (e.g. Fredricks et al., 2004; Rotgans and Schmidt, 2011; Hollebeek, 2011), an initial set of categories was constructed around the three core dimensions of engagement: cognitive, emotional, and behavioral (see Table 1). These categories served as provisional labels for the first cycle of coding.
As the analysis progressed, students’ reflections and group discussions revealed unexpected patterns and context-specific nuances, prompting refinements to the initial coding structure. Sub-dimensions were introduced, and emergent themes—not extensively addressed in the literature—were added as new nodes within NVivo to capture the full spectrum of the student experience.
This hybrid approach allowed for a balanced integration of theoretical grounding and inductive sensitivity, enhancing both the depth and flexibility of the analysis. NVivo’s capabilities in managing, visualizing, and cross-referencing codes supported a systematic and transparent process. To ensure analytical rigor, discrepancies in code application were jointly reviewed by the research team and were resolved through collaborative discussion. Within the identified dimensions, the following key aspects emerged:
Cognitive engagement: Information processing, immediate feedback, independent studying, concentration, comprehension of topics, memorization, time efficiency, and reinforcement in the learning process.
Emotional engagement: Effective motivation, sense of companionship, entertainment, empathy, anxiety reduction, trusting relationships, and dedication to studying.
Behavioral engagement: Self-directed learning, task completion, achievement of learning objectives, improvement in classroom participation, dynamic interactions, individual support, paced learning, and stimulated attention.
4.1 Findings of Study 1
Text-based and voice-based educational robots offer innovative opportunities to raise students’ cognitive, emotional, and behavioral engagement. Their unique attributes and interaction styles enable them to meet different learning needs, as summarized in Table 2.
Impact of voice-based and text-based educational robots on student engagement
| Engagement dimension | Item | Text-based robot | Voice-based robot |
|---|---|---|---|
| Cognitive engagement | Information processing | Facilitates immediate feedback and repeated content review | Enhances processing through conversational style and multisensory interactions |
| Immediate feedback | Provides structured communication for clear and timely responses | Delivers real-time interaction for direct guidance | |
| Independent studying | Allows students to revisit content at their own pace | Complements studying with supportive conversational interactions | |
| Concentration | Promotes focus through structured responses | Engages focus using multisensory features like voice and visuals | |
| Comprehension of topics | Reinforces understanding through repeated engagement | Encourages deep exploration and active comprehension | |
| Memorization | Enhances retention with structured, reviewable communication | Strengthens memory through multisensory experiences | |
| Time efficiency | Supports efficient study by providing easy access to reviewed content | Optimizes study time by guided communication | |
| Reinforcement in learning | Encourages repeated engagement for structured learning reinforcement | Reinforces learning through interactive engagement | |
| Emotional engagement | Anxiety reduction | Provides a non-judgmental, adaptable environment | Reduces stress through supportive and understanding interactions |
| Effective motivation | Reinforces learning with positive written reinforcements | Inspires motivation through rich interactions | |
| Empathy | Supports stress-free learning with consistent responses | Encourages emotional connection with empathetic communication | |
| Entertainment | Creates a positive learning environment through structured engagement mechanisms | Makes learning enjoyable | |
| Sense of companionship | Provides judgment-free support to foster confidence | Builds connection through compassionate conversational style | |
| Trusting relationships | Builds reliability through clear and consistent communication | Establishes trust through social presence and empathy | |
| Dedication to studying | Inspires commitment through accessible and structured guidance | Fosters dedication with emotional bonding | |
| Behavioral engagement | Achievement of objectives | Stores and reviews information to help meet learning goals | Fosters collaboration through group discussions |
| Stimulated attention | Engages attention with structured, focused communication | Captures attention using voice and visual elements | |
| Classroom participation | Promotes participation by reviewing and storing key information | Encourages group involvement through real-time interactivity | |
| Dynamic interactions | Provides structured responses for engagement | Facilitates dynamic engagement with voice-based features | |
| Individual support | Offers personalized feedback via stored messages | Delivers tailored guidance during interactions | |
| Paced learning | Allows for review of content on demand | Adjusts pace through conversational dynamics | |
| Self-directed learning | Enables self-directed learning through accessible content review | Promotes autonomy with guided interactions | |
| Task completion | Assists with structured guidance for task execution | Provides real-time feedback to enhance task achievement |
| Engagement dimension | Item | Text-based robot | Voice-based robot |
|---|---|---|---|
| Cognitive engagement | Information processing | Facilitates immediate feedback and repeated content review | Enhances processing through conversational style and multisensory interactions |
| Immediate feedback | Provides structured communication for clear and timely responses | Delivers real-time interaction for direct guidance | |
| Independent studying | Allows students to revisit content at their own pace | Complements studying with supportive conversational interactions | |
| Concentration | Promotes focus through structured responses | Engages focus using multisensory features like voice and visuals | |
| Comprehension of topics | Reinforces understanding through repeated engagement | Encourages deep exploration and active comprehension | |
| Memorization | Enhances retention with structured, reviewable communication | Strengthens memory through multisensory experiences | |
| Time efficiency | Supports efficient study by providing easy access to reviewed content | Optimizes study time by guided communication | |
| Reinforcement in learning | Encourages repeated engagement for structured learning reinforcement | Reinforces learning through interactive engagement | |
| Emotional engagement | Anxiety reduction | Provides a non-judgmental, adaptable environment | Reduces stress through supportive and understanding interactions |
| Effective motivation | Reinforces learning with positive written reinforcements | Inspires motivation through rich interactions | |
| Empathy | Supports stress-free learning with consistent responses | Encourages emotional connection with empathetic communication | |
| Entertainment | Creates a positive learning environment through structured engagement mechanisms | Makes learning enjoyable | |
| Sense of companionship | Provides judgment-free support to foster confidence | Builds connection through compassionate conversational style | |
| Trusting relationships | Builds reliability through clear and consistent communication | Establishes trust through social presence and empathy | |
| Dedication to studying | Inspires commitment through accessible and structured guidance | Fosters dedication with emotional bonding | |
| Behavioral engagement | Achievement of objectives | Stores and reviews information to help meet learning goals | Fosters collaboration through group discussions |
| Stimulated attention | Engages attention with structured, focused communication | Captures attention using voice and visual elements | |
| Classroom participation | Promotes participation by reviewing and storing key information | Encourages group involvement through real-time interactivity | |
| Dynamic interactions | Provides structured responses for engagement | Facilitates dynamic engagement with voice-based features | |
| Individual support | Offers personalized feedback via stored messages | Delivers tailored guidance during interactions | |
| Paced learning | Allows for review of content on demand | Adjusts pace through conversational dynamics | |
| Self-directed learning | Enables self-directed learning through accessible content review | Promotes autonomy with guided interactions | |
| Task completion | Assists with structured guidance for task execution | Provides real-time feedback to enhance task achievement |
4.1.1 Cognitive engagement
Cognitive engagement encompasses a student’s cognitive resource investment in their learning (e.g. their mental elaboration). The studied educational robots facilitate cognitive engagement by offering tailored opportunities for students to interact with learning materials at their own pace and based on their preferred learning style. By delivering personalized, timely, and context-specific guidance, these robots enrich the educational experience, promoting deeper comprehension and active involvement with the content. The findings suggest that both robots play a pivotal role in fostering resilience, motivating students to persist through challenges and maintain their focus on achieving their educational goals.
Text-based robot. The text-based robot supports cognitive engagement by enabling immediate feedback and content review, allowing learners to revisit learning materials to reinforce their understanding. This robot is particularly useful in enhancing information processing, where its structured communication allows students to systematically break down complex topics. By offering repeated content review, the text-based robot helps students memorize key content and facilitate content retention. For example, one respondent states:
The fact that I could read and reread the answer multiple times with the text-based robot helped me remember the topics better. I appreciated being able to ask any question that came to mind and getting an immediate response, almost like my teacher, but with the added benefit of deciding how to proceed on my own. (Student 35)
The text-based robot encourages independent study by allowing students to manage their learning at their own pace. It supports their timely access to specific content, facilitating a seamless educational experience. Moreover, by offering immediate feedback, the text-based robot helps maintain students’ concentration to improve their comprehension and accuracy of their learning. One respondent notes:
Its interactive features, like the ability to go back to a topic by simply clicking in the chat, allowed me to revisit what I didn’t understand several times. I was more focused, and I can say that thanks to the text-based robot, I understood the lesson better. Receiving immediate feedback on assignments served as a motivating factor. It felt akin to having a dedicated tutor readily available whenever I needed assistance. (Student 23)
Voice-based robot. The voice-based robot employs a conversational, multisensory approach to foster students’ cognitive engagement. Its dynamic interactions, combining auditory and visual elements, enhance attention and concentration, while also promoting their learning-related comprehension. The voice-based robot’s ability to provide real-time guidance while adapting to learners’ responses fosters a deeper exploration of material and encourages critical thinking. One participant notes:
The way the voice-based robot expressed itself made it clear whether I had answered correctly or not. When I was wrong, it encouraged me to do better next time. It tracked my answers and suggested how to proceed but left the decision up to me. (Student 78)
The voice-based robot’s positive and negative voice reinforcement stimulates information processing, as its conversational cues direct learners to reflect on their errors and refine their understanding, while also facilitating content recall and memorization, as illustrated by the following respondent:
Simply having the voice-based robot speak kept me on track and focused during my learning sessions. Its voice was so engaging, almost like my professor. Thanks to the combination of reading and listening, it seemed easier to learn and memorize. (Student 98)
4.1.2 Emotional engagement
Students’ emotional engagement encompasses their investment of affective resources in their learning (e.g. by being passionate about their learning). The studied educational robots were found to boost emotional engagement. For example, by tailoring learning resources to students’ emotive needs, these robots foster positive emotional responses (e.g. heightened satisfaction, deeper commitment to learning, and fostering a sense of belonging) and cultivate favorable attitudes toward learning.
The findings highlight the ability of these robots to create supportive, enjoyable learning by providing interactive activities that elicit students’ positive emotions). Elicit positive affective responses. These interactions enable students to connect with their learning on an emotive level, while also fostering positive relationships with their teachers and peers signaling the strategic value of these robots.
Text-based robot. The text-based robot was likewise found to foster students’ emotional engagement (e.g. by creating a judgment-free environment, enabling students to explore topics without fear of criticism). Its ability to adapt to individual learning preferences is conducive to the development of trust, raising students’ self-confidence. By consistently affording clear communication, the text-based robot ensures that students feel supported through their learning journey, in turn boosting their emotional state while fostering their positive attitude toward their learning.
The text-based robot was also found to reduce anxiety and stress through its structured, adaptable design. Students appreciate its non-judgmental stance, encouraging them to ask questions. One student remarks:
Knowing that the text-based robot provides reliable answers makes me feel at ease and less nervous about learning tasks; It creates a stress-free space where I can ask any kind of question without feeling intimidated, as might happen with a teacher. (Student 129)
Its ability to consistently enhance supportive, clear communication nurtures students’ perceived empathy of, and connection to, the robot, aligning with their emotional needs and motivating them to complete their learning tasks. The robot was also found to create a sense of social presence and companionship, raising their dedication to their learning. One respondent asserts:
The text-based robot is like a 24/7 companion, always with me and accessible, whether I use my mobile phone, tablet or computer, no matter where I am. Its constant availability allows me to easily access and review or resolve any doubts that arise at the time (Student 152)
Another aspect of students’ emotional engagement with the text-based robot lies in its reinforcement of their dedication to studying. By simplifying complex topics and providing timely feedback, it sustains their commitment to their learning. One participant observes:
The text-based robot’s clear and efficient communication made me feel supported. It simplified the learning process, which increased my confidence and helped me stay committed to my studies. (Student 17)
Voice-based robot. Through its empathetic, expressive interactions, the voice-based robot creates an emotionally rich educational experience. By using a carefully modulated tone, thoughtful pacing, and vocal encouragement, the voice-based robot not only fosters a sense of companionship but also encourages a dynamic emotional connection that makes students feel supported through their learning journey. One participant remarks:
Learning with the voice-based robot is like having a real classmate! I was attracted by the way it spoke and reacted. It was like having a real conversation, which makes learning much more comfortable! (Student 23)
The voice-based robot’s conversational features also motivate students. By providing vocal reinforcement (e.g. praise for correct answers, guidance to fix mistakes), it encourages students to persist in their learning and improve their academic performance. This positive reinforcement creates an emotionally uplifting environment that helps sustain students’ interest and effort.
Another aspect of the voice-based robot’s emotional engagement is its ability to reduce anxiety and stress. Through its calm, non-threatening presence and empathetic interactions, the voice-based robot helps alleviate the stress that is often associated with learning. The respondents frequently highlighted its ability to make challenging tasks more manageable, not only boosting their confidence but also promoting a positive attitude to learning. One participant notes:
In my opinion, the voice-based robot promotes a sense of empathy; its sometimes pleasant and sometimes scolding reactions create a stimulating environment. Having sweet reinforcement reactions when I answered correctly made me more excited to learn. (Student 139)
Moreover, the voice-based robot’s entertaining teaching strategy boosts students’ emotional engagement. By combining humor with engaging, interactive dialog, it renders the learning process more enjoyable, reinforcing students’ emotional connection to their learning. The voice-based robot also builds trust through its consistent, supportive interactions. Its ability to adapt its responses to individual learners fosters a sense of reliability and safety, encouraging students to more openly engage. The voice-based robot’s empathetic tone and personalized guidance help create a meaningful bond, enhancing students’ emotional investment in their learning process.
4.1.3 Behavioral engagement
Behavioral engagement refers to the level of effort, time, and energy that students invest in their learning activities. Both the text- and voice-based robots play an important role in fostering behavioral engagement (e.g. by supporting task completion). At the same time, these robots may save students time (e.g. by guiding them through their learning processes), reducing their required resource investment in their learning and promoting their commitment to their learning, encouraging their active participation. The findings suggest that both the robots enable students’ meaningful interactions (e.g. with their learning material, others, or the technology), facilitating or accelerating the achievement of their learning objectives.
Text-based robot. The text-based robot’s clear, structured communication style enhances self-directed learning, enabling students to manage and monitor their progress. Its ubiquitous availability renders the text-based robot a reliable, convenient platform for students to actively engage with the material. By providing accessible content and allowing users to revisit materials, it facilitates task completion and helps students stay on track with their learning objectives. The respondents frequently cite the text-based robot’s role in encouraging their participation and maintaining focus:
The text-based robot was really helpful for me in completing my classwork. For example, since the messages were stored in the chat and it was easy to go back and forth between messages, I could repeat the lessons whenever I wanted. (Student 34)
The robot promotes students to learn at their own pace, empowering them to take ownership of their learning. By facilitating dynamic interactions, the text-based robot ensures students remain actively engaged with the material, including during repeated tasks. One respondent notes:
The text-based robot’s messages were very clear and gave you the option to follow up simply by asking questions. While this was very helpful and encouraged me to actively participate in the conversation, the interaction was between me and the chatbot. (Student 74)
Voice-based robot. The voice-based robot boosts students’ behavioral engagement by creating a dynamic, interactive learning environment. Its real-time feedback and conversational approach encourage students to actively participate in their learning tasks. The voice-based robot’s ability to engage groups fosters collaboration (e.g. as students interact to exchange ideas). One participant stated:
The voice-based robot’s presence encouraged us to collaborate as a group. It would ask questions and prompt us to discuss answers together, which made the educational experience more interactive and enjoyable. (Student 154)
The voice-based robot’s focus on dynamic interactions and tailored guidance strengthens students’ connection to the material, fostering their sustained effort and attention. One respondent stated:
The voice-based robot’s presence in the class brought us together as a group. Not only did we have to pay attention while listening, but at the end of each explanation, it would ask questions and encourage us to discuss the answers. (Student 24)
4.2 Discussion of Study 1
This initial study has been essential in exploring how educational robots contribute to students’ cognitive, emotional, and behavioral engagement. By analyzing students’ qualitative feedback, the research has identified key components underlying each dimension of engagement, as result of an abductive approach, iteratively combining established theoretical frameworks on student engagement with patterns that emerged from the empirical material (see Tables 1 and 2). The study focused on capturing the richness of students' experiences to gain a deeper, exploratory understanding of how engagement unfolds in interactions with educational technologies.
Our findings shed light on how voice- and text-based educational robots influence students’ cognitive, emotional, and behavioral engagement. The results for RQ1 (How do voice-based and text-based educational robots shape students’ cognitive, emotional and behavioral engagement) suggest that both text- and voice-based educational robots represent effective tools to foster students’ engagement, albeit in different ways. The text-based robot adopts a structured communication style to consistently facilitate students’ clarity on the subject matter. By providing immediate feedback and reviewing content, text-based robots allow students to revisit the learning materials as needed, reinforcing their understanding, improving retention, and allowing them to break down complex topics, in turn facilitating their information processing and comprehension. Its ubiquitous accessibility further supports student engagement, permitting continuous learning progress. The text-based robot’s design also helps students monitor their progress and align their efforts with their academic goals, fostering self-directed learning. While operating solely through written communication, the text-based robot creates a judgment-free, supportive environment, fostering students’ dedication to their learning. This attribute particularly appeals to learners who value autonomy and efficiency.
However, the voice-based robot primarily engages students through its dynamic, immersive design. The voice-based robot’s multisensory, conversational approach boosts concentration and comprehension. By integrating auditory cues, visual feedback, and real-time interaction, the voice-based robot supports students in more effectively processing and retaining information, stimulating their curiosity, and encouraging them to explore complex topics. The voice-based robot’s ability to provide tailored guidance and real-time feedback mirrors the support of a personal tutor, rendering an interactive educational experience. The voice-based robot’s empathetic tone and emotional responsiveness also foster students’ motivation to learn, addressing their emotional needs. By reducing anxiety and building trust, the voice-based robot creates a supportive environment where students feel understood and encouraged, promoting their emotional connection to their learning. This emotional resonance is critical, as it not only increases students’ motivation but also helps them overcome learning challenges. Importantly, the set of items developed through this abductive process serves as a valuable tool for the next phase of research. They enable a structured and theory-informed means of measuring student engagement and will support a subsequent study aimed at quantitatively comparing the impact of text-based and voice-based educational robots. This next phase allows for a more systematic assessment of how different interaction modalities influence the various dimensions of engagement, thereby informing evidence-based design and implementation strategies in educational contexts.
5. Study 2. Comparative assessment of student engagement dimensions
Study 2 adopted a sequential research design, building on the qualitative insights derived from Study 1. Its main objective was to quantitatively assess student engagement across two modes of human-robot interaction: text-based and voice-based. To ensure the instrument’s validity and reliability, the study followed established scale development procedures (DeVellis and Thorpe, 2021). The process began with the operationalization of engagement constructs, synthesizing theoretical models and empirical findings from Study 1. Each item was designed to reflect specific indicators of cognitive, emotional, or behavioral engagement. The preliminary scale was reviewed by an expert in the field (a top-cited author on engagement studies indexed in Web of Science) to assess content validity and conceptual coherence.
A pilot study involving 30 students was then conducted to evaluate item clarity, language, and overall structure. Descriptive statistics and item-total correlations were used to assess initial reliability, while participant feedback informed refinements to ensure clarity and alignment with the intended dimensions. These procedures reinforced the instrument’s validity and improved its sensitivity to the context of human-robot interaction.
The finalized questionnaire was administered to 100 undergraduate students enrolled in a management course. After excluding eight incomplete responses, the final sample consisted of 92 participants (54 males, 46 females; mean age = 21). The instrument employed a five-point Likert scales (1 = strongly disagree to 5 = strongly agree), with items carefully mapped to the cognitive, emotional, and behavioral dimensions—as well as their sub-components—emerging from the initial qualitative analysis. This structure enabled a comprehensive and nuanced measurement of student engagement with both the text-based and voice-based robots.
To ensure questionnaire reliability, Cronbach’s alpha was computed for each student engagement dimension, confirming internal consistency of the items (Taherdoost, 2016; Aithal and Aithal, 2020). Descriptive statistics, including means and standard deviations, were computed to summarize central tendencies and variability. Paired t-tests, a statistical method used to compare the means of two related groups to determine if there is a significant difference between them (Hsu, 2005), were applied to: (1) compare students’ cognitive, emotional, and behavioral engagement in their interactions with the text- and the voice-based robot; (2) evaluate engagement-based differences for individual items (e.g. “immediate feedback” or “sense of companionship”) across the robots. This test is particularly suitable because the same participants interacted with both robots, pairing the engagement scores for each robot and ensuring their direct comparability.
We also computed Cohen’s d to determine effect sizes, providing insight into the magnitude of these differences. Cohen’s d is a measure of effect size, which quantifies the magnitude of the difference between two groups (Cohen, 1988). For cognitive, emotional, and behavioral student engagement, Cohen’s d quantifies how much more effective one robot is (vs. the other) in engaging students. For individual items: It assesses the strength of differences in specific features like “time efficiency” or “anxiety reduction”.
Collectively, these analyses ensured a robust quantitative comparison of the impact of the studied voice vs. text-based educational robots on students’ engagement. Radar charts were also employed to visualize the results, offering a clear representation of the engagement dimensions, allowing researchers to identify patterns and trends (Cleveland and McGill, 1984). For example, the text-based robot was found to support independent study and immediate feedback, while the voice-based robot excelled in fostering dynamic interactions and emotional support. By overlaying the data for both robots on a radar plot, areas of overlap and divergence became evident, facilitating a detailed comparison of the strengths and weaknesses of these respective robots. This visual representation complemented the statistical findings, enhancing the interpretability of the results.
5.1 Findings of Study 2
The Cronbach’s alpha demonstrated high internal consistency for both the voice- and text-based robots. The text-based robot’s Cronbach’s alpha was 0.918, while that for the voice-based robot was 0.933, both exceeding the commonly used 0.7 threshold. The voice-based robot’s slightly higher alpha suggests the somewhat higher cohesiveness of responses to its items (vs. those for the text-based robot), possibly owing to its ability to evoke more uniform participant experiences (e.g. given its multisensory, interactive attributes). To understand how the two robots influence student engagement, descriptive statistics were analyzed to examine the central tendencies and variability of responses across the engagement dimensions.
Text-based robot. Th text-based robot’s mean engagement scores ranged from 3.185 (SD = 0.741) to 3.837 (SD = 1.135), indicating moderate engagement, with some cross-item variability. The higher standard deviation observed for specific items reflects greater diversity in student perceptions, which may be attributed to individual differences in their preferences for text-based interactions.
Voice-based robot. The voice-based robot’s mean engagement scores ranged from 3.435 (SD = 0.768) to 3.880 (SD = 1.124). These slightly higher means (vs. those for the text-based robot) suggest the voice-based robot’s capacity to elicit students’ higher engagement. Moreover, the smaller standard deviations for some items indicate more consistent student experiences, likely due to the voice-based robot’s conversational style and interactive features, boosting their engagement.
We next analyze the effects of voice- and text-based educational robots on students’ cognitive, emotional, and behavioral engagement by integrating the paired t-test and Cohen’s d results (Tables 3 and 4) with those obtained from the radar chart visualizations. Particular attention is paid to those items that see a marked benefit of the voice-based robot over the text-based robot. Integrating these results, we find that while similar findings are attained for the voice- vs. text-based robots for most items, the former exhibits meaningful advantages. However, as radar charts visually confirm significant differences, the effect sizes for most items remain small, suggesting that practical implications may vary depending on context.
Paired t-test and Cohen’s d results
| Dimension | T-statistic | p-value | Cohen’s d |
|---|---|---|---|
| Cognitive | −2.229 | 0.028 | −0.195 |
| Emotional | −3.116 | 0.002 | −0.215 |
| Behavioral | −1.493 | 0.139 | −0.109 |
| Dimension | T-statistic | p-value | Cohen’s d |
|---|---|---|---|
| Cognitive | −2.229 | 0.028 | −0.195 |
| Emotional | −3.116 | 0.002 | −0.215 |
| Behavioral | −1.493 | 0.139 | −0.109 |
Student engagement items
| Dimension | Items | T-statistic | p-value | Cohen’s d |
|---|---|---|---|---|
| Cognitive engagement | Information processing | −1.101 | 0.274 | −0.115 |
| Immediate feedback | −0.709 | 0.480 | −0.074 | |
| Independent studying | −0.815 | 0.417 | −0.085 | |
| Concentration | −3.539 | 0.001 | −0.369 | |
| Comprehension of topics | −0.483 | 0.630 | −0.050 | |
| Memorization | −1.642 | 0.104 | −0.171 | |
| Time efficiency | 0.528 | 0.6 | 0.055 | |
| Reinforcement in learning process | −0.871 | 0.385 | −0.09 | |
| Emotional engagement | Effective motivation | −0.757 | 0.451 | −0.079 |
| Sense of companionship | −0.363 | 0.717 | −0.038 | |
| Entertainment | −1.558 | 0.123 | −0.162 | |
| Empathy | −2.637 | 0.010 | −0.275 | |
| Anxiety reduction | −2.257 | 0.026 | −0.235 | |
| Trusting relationships | −1.578 | 0.118 | −0.165 | |
| Dedication to studying | −1.880 | 0.063 | −0.196 | |
| Behavioral engagement | Self-directed learning | −1.995 | 0.049 | −0.208 |
| Task completion | 0.097 | 0.923 | 0.010 | |
| Achieving of learning objectives | 0.345 | 0.731 | 0.036 | |
| Improvement in classroom participation | −1.116 | 0.268 | −0.116 | |
| Dynamic interactions | −1.107 | 0.271 | −0.115 | |
| Individual support | −1.503 | 0.136 | −0.157 | |
| Paced learning | 0.477 | 0.635 | 0.050 | |
| Stimulated attention | −0.560 | 0.577 | −0.058 |
| Dimension | Items | T-statistic | p-value | Cohen’s d |
|---|---|---|---|---|
| Cognitive engagement | Information processing | −1.101 | 0.274 | −0.115 |
| Immediate feedback | −0.709 | 0.480 | −0.074 | |
| Independent studying | −0.815 | 0.417 | −0.085 | |
| Concentration | −3.539 | 0.001 | −0.369 | |
| Comprehension of topics | −0.483 | 0.630 | −0.050 | |
| Memorization | −1.642 | 0.104 | −0.171 | |
| Time efficiency | 0.528 | 0.6 | 0.055 | |
| Reinforcement in learning process | −0.871 | 0.385 | −0.09 | |
| Emotional engagement | Effective motivation | −0.757 | 0.451 | −0.079 |
| Sense of companionship | −0.363 | 0.717 | −0.038 | |
| Entertainment | −1.558 | 0.123 | −0.162 | |
| Empathy | −2.637 | 0.010 | −0.275 | |
| Anxiety reduction | −2.257 | 0.026 | −0.235 | |
| Trusting relationships | −1.578 | 0.118 | −0.165 | |
| Dedication to studying | −1.880 | 0.063 | −0.196 | |
| Behavioral engagement | Self-directed learning | −1.995 | 0.049 | −0.208 |
| Task completion | 0.097 | 0.923 | 0.010 | |
| Achieving of learning objectives | 0.345 | 0.731 | 0.036 | |
| Improvement in classroom participation | −1.116 | 0.268 | −0.116 | |
| Dynamic interactions | −1.107 | 0.271 | −0.115 | |
| Individual support | −1.503 | 0.136 | −0.157 | |
| Paced learning | 0.477 | 0.635 | 0.050 | |
| Stimulated attention | −0.560 | 0.577 | −0.058 |
Cognitive engagement: The paired t-test for cognitive engagement reveals a statistically significant difference favoring the voice-based robot, with a t-statistic of −2.229 (p = 0.028) and a small effect size (Cohen’s d = −0.195), indicating the voice-based robot’s slightly greater ability to cognitively engage students (vs. the text-based robot). The radar chart supports this result, show the voice-based robot outperform the text-based robot on several axes (see Figure 1).
The radar chart shows the title “Cognitive Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains eight labeled axes arranged radially, with nine concentric octagonal rings numbered from the center outward as ring 1 through ring 9. The axis labels appear clockwise starting from the top as “Information processing”, “Immediate feedback”, “Independent studying”, “Concentration”, “Comprehension of topics”, “Memorization”, “Time efficiency”, and “Reinforcement in the learning process”. The data plotted on the radial axes is shown as follows: Information processing: Text-based robot: At Ring 6 and the voice-based robot: At Ring 7. Immediate feedback: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and Ring 8. Independent studying: Text-based robot: At Ring 6 and the voice-based robot: Between Ring 6 and Ring 7. Concentration: Text-based robot: At Ring 3 and the voice-based robot: At Ring 7. Comprehension of topics: Text-based robot: Between Ring 7 and 8 and the voice-based robot: At Ring 8. Memorization: Text-based robot: At Ring 6 and the voice-based robot: Between Ring 7 and Ring 8. Time efficiency: Text-based robot: At Ring 8 and the voice-based robot: Between Ring 7 and Ring 8. Reinforcement in the learning process: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and Ring 8.Cognitive student engagement radar. Source: Created by authors
The radar chart shows the title “Cognitive Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains eight labeled axes arranged radially, with nine concentric octagonal rings numbered from the center outward as ring 1 through ring 9. The axis labels appear clockwise starting from the top as “Information processing”, “Immediate feedback”, “Independent studying”, “Concentration”, “Comprehension of topics”, “Memorization”, “Time efficiency”, and “Reinforcement in the learning process”. The data plotted on the radial axes is shown as follows: Information processing: Text-based robot: At Ring 6 and the voice-based robot: At Ring 7. Immediate feedback: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and Ring 8. Independent studying: Text-based robot: At Ring 6 and the voice-based robot: Between Ring 6 and Ring 7. Concentration: Text-based robot: At Ring 3 and the voice-based robot: At Ring 7. Comprehension of topics: Text-based robot: Between Ring 7 and 8 and the voice-based robot: At Ring 8. Memorization: Text-based robot: At Ring 6 and the voice-based robot: Between Ring 7 and Ring 8. Time efficiency: Text-based robot: At Ring 8 and the voice-based robot: Between Ring 7 and Ring 8. Reinforcement in the learning process: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and Ring 8.Cognitive student engagement radar. Source: Created by authors
Among the cognitive engagement items, concentration stands out as a significant differentiator (p = 0.0001, Cohen’s d = −0.369), as explained by voice-based robot’s ability to sustain students’ attention through auditory cues and visual feedback. The radar charts emphasize the voice-based robot’s relative advantage, with a notable extension in its polygon (vs. the text-based robot). The difference, also observed in memorization support (while not statistically significant), demonstrates that both effect sizes suggest a relative advantage of the voice- vs. text-based robots. Other cognitive engagement items (e.g. information processing, time efficiency) did not show statistically significant differences, suggesting the equal effectiveness of the text-based robot’s structured design in these areas.
Emotional engagement: Emotional engagement was found to exhibit the most significant difference, with a t-statistic of −3.116 (p = 0.002) and a small but meaningful effect size (Cohen’s d = −0.215), suggesting that the voice-based robot more effectively fosters students’ emotional engagement. Among the emotional engagement items, empathy (p = 0.0098, Cohen’s d = −0.275) and anxiety/stress reduction (p = 0.026, Cohen’s d = −0.235) stand out as being significant.
Other items (e.g. dedication to study) approached significance (p = 0.063, Cohen’s d = −0.196). While the voice-based robot’s ability to foster emotional bonds plays a role, its performance in entertaining and motivating students further emphasizes its ability to make learning more enjoyable. While these differences were not statistically significant, the radar chart highlights the voice-based robot’s relative strength in terms of its ability to foster emotional connection and reduce anxiety/stress (Figure 2).
The radar chart shows the title “Emotional Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains seven labeled axes arranged radially, with eight concentric octagonal rings numbered from the center outward as ring 1 through ring 8. The axis labels appear clockwise starting from the top as “Effective motivation”, “Sense of companionship”, “Entertainment”, “Empathy”, “Anxiety reduction”, “Trusting relationships”, and “Dedication to study”. The data plotted on the radial axes is shown as follows: Effective motivation: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Sense of companionship: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Entertainment: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Empathy: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Anxiety reduction: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Trusting relationships: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Dedication to study: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8.Emotional engagement radar. Source: Created by authors
The radar chart shows the title “Emotional Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains seven labeled axes arranged radially, with eight concentric octagonal rings numbered from the center outward as ring 1 through ring 8. The axis labels appear clockwise starting from the top as “Effective motivation”, “Sense of companionship”, “Entertainment”, “Empathy”, “Anxiety reduction”, “Trusting relationships”, and “Dedication to study”. The data plotted on the radial axes is shown as follows: Effective motivation: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Sense of companionship: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Entertainment: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Empathy: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Anxiety reduction: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Trusting relationships: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8. Dedication to study: Text-based robot: At Ring 7 and the voice-based robot: Between Ring 7 and 8.Emotional engagement radar. Source: Created by authors
Behavioral engagement: Behavioral engagement did not show a statistically significant difference across both robots, with a t-statistic of −1.493 (p = 0.139) and a negligible effect size (Cohen’s d = −0.109), indicating that both robots are equally as effective in fostering students’ behavioral engagement.
Self-directed learning is the only behavioral engagement item that approaches significance (p = 0.049, Cohen’s d = −0.208). The voice-based robot’s guided interactions, which adapt to students’ needs in real-time, likely give it a marginal advantage in helping students monitor or adjust their learning behaviors. However, items including task completion and paced learning showed no significant differences, highlighting the complementary capabilities of the text- and voice-based robots in supporting behavioral engagement. The radar chart shows that the text-based robot performs better in terms of paced learning and the achievement of learning objectives, while the voice-based robot excels in dynamic interactions, underscoring that the robots cater to different behavioral needs (see Figure 3).
The radar chart shows the title “Behavioral Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains eight labeled axes arranged radially, with eight concentric octagonal rings numbered from the center outward as ring 1 through ring 8. The axis labels appear clockwise starting from the top as “Self-managed learning”, “Task completion”, “Achievement of learning objectives”, “Improvement in classroom participation”, “Dynamic interactions”, “Individual support”, “Paced learning”, and “Stimulated attention”. The data plotted on the radial axes is shown as follows: Self-managed learning: Text-based robot: At Ring 3 and the voice-based robot: At Ring 5. Task completion: Text-based robot: Between Ring 5 and 6 and the voice-based robot: Between Ring 5 and 6. Achievement of learning objectives: Text-based robot: Between Ring 6 and 7 and the voice-based robot: Between Ring 6 and 7. Improvement in classroom participation: Text-based robot: At Ring 5 and the voice-based robot: Between Ring 6 and 7. Dynamic interactions: Text-based robot: At Ring 5 and the voice-based robot: At Ring 6. Individual support: Text-based robot: Between Ring 3 and 4 and the voice-based robot: At Ring 5. Paced learning: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Stimulated attention: Text-based robot: Between Ring 5 and 6 and the voice-based robot: Between Ring 6 and 7.Behavioral engagement radar. Source: Created by authors
The radar chart shows the title “Behavioral Student Engagement” at the top and compares two plotted lines identified in the legend as “Text-based robot” and “Voice-based robot”. The chart contains eight labeled axes arranged radially, with eight concentric octagonal rings numbered from the center outward as ring 1 through ring 8. The axis labels appear clockwise starting from the top as “Self-managed learning”, “Task completion”, “Achievement of learning objectives”, “Improvement in classroom participation”, “Dynamic interactions”, “Individual support”, “Paced learning”, and “Stimulated attention”. The data plotted on the radial axes is shown as follows: Self-managed learning: Text-based robot: At Ring 3 and the voice-based robot: At Ring 5. Task completion: Text-based robot: Between Ring 5 and 6 and the voice-based robot: Between Ring 5 and 6. Achievement of learning objectives: Text-based robot: Between Ring 6 and 7 and the voice-based robot: Between Ring 6 and 7. Improvement in classroom participation: Text-based robot: At Ring 5 and the voice-based robot: Between Ring 6 and 7. Dynamic interactions: Text-based robot: At Ring 5 and the voice-based robot: At Ring 6. Individual support: Text-based robot: Between Ring 3 and 4 and the voice-based robot: At Ring 5. Paced learning: Text-based robot: Between Ring 6 and 7 and the voice-based robot: At Ring 7. Stimulated attention: Text-based robot: Between Ring 5 and 6 and the voice-based robot: Between Ring 6 and 7.Behavioral engagement radar. Source: Created by authors
5.2 Discussion of Study 2
We compared the capacity of the voice- vs. text-based robots in terms of fostering students’ cognitive, emotional, and behavioral engagement (RQ2: To what extent do voice- vs. text-based educational robots differ in their ability to enhance students’ cognitive, emotional, and behavioral engagement?). The voice-based robot was found to have a particular capacity to cognitively and emotionally engage students. Key differences across these robots were found for emotional engagement, where the voice-based robot outperformed the text-based robot in fostering students’ emotional connection to their learning. The obtained result for cognitive engagement corroborates this finding, where the voice-based robot’s auditory cues and visual feedback significantly surpass the text-based robot in enhancing concentration. These findings underscore the distinctive strengths of voice-based robots in creating emotionally supportive, immersive educational experiences. In particular, the voice-based robot’s ability to foster concentration, motivation, and emotional connection positions it as an ideal tool for collaborative, stimulating rich learning environments. Conversely, while the robots were found to elicit students’ similar behavioral engagement overall, the text-based robot’s capacity to store and present information for repeated review affords students greater learning-related autonomy, as the radar charts show. The text-based robot therefore emerges as a tool that is particularly useful in learning areas requiring structure and precision, revealing its particular suitability for those valuing predictable, self-paced learning environments.
Though the results suggest that both robots have their respective strengths in terms of fostering students’ engagement, our findings clarify the ways in which these robots differ. The voice-based robot comprehensively engages students in a lifelike manner, while the text-based robot excels in engaging students through task efficiency and paced learning. However, the advantages of voice-based interactions render the voice-based robot a more versatile, impactful tool for fostering students’ emotional engagement, suggesting its greater transformative potential.
The findings also suggest that the text- and voice-based robots are not direct competitors but rather complementary tools, each being particularly apt at eliciting different dimensions of students’ engagement. Collectively, they highlight the potential for combining structured, self-directed learning with dynamic, empathetic interactions to create a more holistic educational experience.
6. Theoretical implications
This study offers new insights for marketing and service scholars on how voice- and text-based robots enhance user engagement. It contributes to the academic discourse by examining the role of AI-powered robots in education and their potential to transform the learning journey. By integrating perspectives from the student engagement and service robot literature, our findings provide a foundation for understanding the impact of these technologies on educational experiences.
First, by comparing the capacity of voice- and text-based educational robots in shaping students’ cognitive, emotional, and behavioral engagement, our analyses respond to the need for further acumen on AI adoption in service contexts (Bond and Bedenlier, 2019; Nkomo et al., 2021; Wirtz et al., 2018; Mende et al., 2025). Prior student engagement research widely recognizes the multidimensional nature of engagement (e.g. Huang et al., 2022; Nkomo et al., 2021). Building on this body of work, our study added to this literature stream by exploring the effect of voice- vs. text-based robots on students’ cognitive, emotional, and behavioral engagement. While prior authors have often examined these dynamics in isolation focusing on engagement characteristics of specific types of robots, we offer a landmark comparison of voice- vs. text-based robots in the education context. These comparisons provide a key strategic resource to educational and service researchers. Specifically, text-based robots were found to be particularly useful in fostering thoughtful reflection, mental elaboration, and self-paced, reflective learning, highlighting students’ desire for individual agency, reflection, and deep mental processing (Chiu, 2021). Conversely, the immediacy and social presence offered by voice features emphasize the importance of interaction, feedback, and shared educational experiences (Wang et al., 2023). These findings enrich scholarly understanding of technology-facilitated engagement by highlighting the nuanced ways in which different modalities influence student outcomes (Hollebeek and Belk, 2021).
Second, our work adds to the engagement in service literature (e.g. Brodie et al., 2011; Viglia et al., 2023) by emphasizing the role of different service robots in enhancing user engagement. Specifically, our work extends existing theory on human-machine engagement (e.g. Hollebeek et al., 2024) by identifying how voice- and text-based robots influence users’ voluntary investment of their resources in their interactions with these technologies. The findings reveal that engagement is not merely a function of the robot’s technological capabilities but is deeply intertwined with the user’s perceived relevance of the task to be done and their emotional response to it, extending prior research (e.g. Nkomo et al., 2021; Wong and Liem, 2022). To validate and extend our findings, researchers are encouraged to compare and contrast the roles and effects of additional or multi-modal service robot types on users’ engagement and behavioral outcomes.
Third, we address a critical gap in the service robot literature by focusing on educational robots, a domain that remains relatively under-represented in the scholarly discourse (Mennens et al., 2025; Mende et al., 2025), which, to date, has focused on hospitality, retailing, and healthcare settings (Mele and Russo-Spena, 2025). By contrasting the capabilities of voice- and text-based educational robots, we provide a foundation for understanding their potential to enhance educational experiences. This analysis not only informs the adoption of educational robots but also offers theoretical insights into their role as agents of personalized and interactive learning (Mele et al., 2022).
7. Practical implications
The findings also offer significant practical implications for professors, higher education institutions, and manufacturers and marketers of educational robots. By leveraging the benefits of voice- and text-based robots, stakeholders can create more inclusive, engaging educational ecosystems by optimizing the benefits of each technology (Kumar et al., 2025a).
For professors and higher education institutions, our research, which highlights the transformative potential of voice-based robots in fostering engagement, in particular, underscores the importance of integrating robotic technologies into teaching practices. Their ability to create emotionally rich interactions can boost students’ educational experience (e.g. by fostering collaboration, stimulating motivation, or encouraging active participation; Chiu, 2021; Huang et al., 2023). However, while voice-based robots emerged as particularly apt at raising students’ engagement, text-based robots are also important, particularly to facilitate independent, self-paced learning and content review. A blended approach—where voice-based robots are used to promote emotional engagement, while text-based robots support individualized, task-focused learning—could yield optimal outcomes.
Institutions are advised to integrate these technologies into their curricula, aligning their capabilities with specific learning objectives and student preferences. Professional development programs are essential to equip educators with the skills to use these robots effectively. Institutions are also advised to invest in infrastructure to support seamless integration (e.g. rapid, reliable Internet connectivity; adaptable classroom layouts), enhancing the usability of these robots. We recommend professors to experiment to find suitable ways to responsibly optimize students’ cognitive, emotional, and behavioral engagement (Kumar et al., 2025b). Overall, by emphasizing the strengths of voice- and text-based robots, learning environments can be substantially enriched.
For the manufacturers and marketers of educational robots, this research highlights the importance of designing robots catering to different educational needs while emphasizing adaptability. Our findings stress the need to prioritize developing advanced voice-based robots, as their attributes are particularly effective in creating effective learning environments. Enhancements like the deployment of an empathetic tone, adaptive feedback, and the ability to guide group dynamics can further elevate their impact. At the same time, providers should continue refining text-based robots to ensure they remain a powerful tool for self-regulation, detailed feedback, and content mastery.
Providers should also explore ways to seamlessly integrate these robot types, offering interoperable systems that allow educators to deploy them in tandem for complementary purposes. Close collaboration with educators in the design/testing phases can yield valuable insight, ensuring the robots address practical challenges and meet particular classroom needs. Providers should also prioritize robot user-friendliness (Blut et al., 2021), offering training and support to help institutions and educators unlock the full potential of these tools.
8. Limitations and further research
Despite its contribution, this study is not without limitations, opening further research avenues. First, a key methodological limitation arises from the inherent differences in the physicality and design of robots. Specifically, while the voice-based robot used in the study features a physical embodiment, the text-based robot is delivered via a screen-based interface. This divergence may introduce variations in interactivity, perceived anthropomorphism, and group dynamics, which could influence student engagement beyond the core distinction between voice- and text-based communication. Future research could address this limitation by employing a single robot platform capable of supporting multiple interaction modalities, or by systematically isolating and manipulating specific design features to enhance the robustness and generalizability of the findings. For instance, a promising direction would be to compare two screen-based robots (one utilizing text-based communication and the other voice-based) in order to disentangle the effects of communication modality from those of physical embodiment. This approach would enable researchers to explore questions such as: How might student engagement differ when both robots share the same screen-based format? and To what extent does a non-physical presence diminish a voice-based robot’s capacity to foster cognitive and emotional engagement?
Second, our reliance on paired t-tests and Cohen’s may also bias the results. While these tests focus on mean differences and effect size quantification, they may not capture nuanced patterns within participant sub-groups or interactions among engagement dimensions, yielding sample questions including the following: How do individual learning styles or preferences influence engagement with voice- versus text-based robots? May underlying factors (e.g. knowledge of AI) mediate the observed differences? Incorporating more complex statistical methods like multivariate analysis or machine learning could provide deeper insights into these questions. Additional questions include: How might using different engagement measures, such as physiological or behavioral indicators (e.g. eye tracking or task completion time) affect the findings? How could combining multiple engagement metrics (e.g. emotional and behavioral indicators) enhance the reliability and validity of measuring engagement in robot-student interactions?
Third, the research design compared voice-based and text-based robots independently, but it did not examine how these technologies might function together. Sample questions include: How might hybrid systems that integrate text-based and voice-based robots create synergies to enhance students’ cognitive, emotional and behavioral engagement? Could dual-modality systems, where these robots work collaboratively, amplify their impact in real-time classroom settings? Moreover, this research was conducted in a specific educational and cultural context that primarily involved undergraduate Management students. Different results may be attained in other contexts (e.g. primary/special education or across different regions/countries). Sample questions include: How do cultural perceptions, variations in educational norms, and technological infrastructure shape the acceptance and effectiveness of educational robots? Would the findings differ across educational settings or traditions, and how can cross-cultural studies address these issues?
Finally, as AI continues to advance, the capabilities of educational robots will continue to improve. Therefore, future studies may wish to replicate the proposed research design in the context of future generations of text- and voice-based robots or conduct longitudinal investigation into the evolution of these dynamics. Sample questions include: What are the long-term effects of educational robots on learning outcomes, motivation, and engagement over extended periods? Are there diminishing returns or unintended consequences, such as habituation or reduced effectiveness, with students’ sustained exposure to these technologies?
The supplementary material for this article can be found online.

