The main goal of this meta-analytic study was to explore the effects of the gamified flipped classroom (GFC) in higher education, considering both cognitive (i.e. academic achievement) and non-cognitive (i.e. motivation and engagement) learning outcomes.
The search in the Scopus, Web of Science and EBSCO databases yielded 26 relevant experimental and quasi-experimental studies. The meta-analysis followed the PRISMA guidelines (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and CMA (Comprehensive Meta-analysis version 4) was used for statistical analyses. Overrepresentation of studies reporting multiple outcomes was controlled by using one effect size per outcome category per study.
The overall pooled effect sizes indicate positive GFC effects on both cognitive (g = 0.921) and non-cognitive outcomes (g = 0.735). These estimates are interpreted cautiously, given the substantial heterogeneity across studies and small-study effects for cognitive outcomes. Moderator analyses for cognitive outcomes indicate that incorporating gamification in pre-class activities was associated with higher effects. Other examined moderators failed to explain the variation across studies. Tests for publication bias, such as funnel plots and Egger's regression, did not indicate substantial bias overall, although some asymmetry was observed.
The study provides additional evidence that GFC is a promising pedagogical approach in higher education, especially when gamification is employed pre-class. It uniquely investigates variations in GFC effectiveness by the type of comparison condition (traditional lecture or non-gamified flipped classroom), and contextual factors (intervention length, gender composition and geographic location), as well as gamification timing, issues that are seldomly explored in previous studies, but highly relevant for instructional design.
Introduction
University faculty continuously seek to make learning more meaningful and engaging. Aiming at this purpose, flipped classroom and gamification have been intensively examined in recent years. The flipped classroom offers more valuable class time for collaboration and problem-solving by moving direct instruction to pre-class activities. Gamification integrates gaming elements into serious learning contexts to enhance motivation and enjoyment. When these two approaches are combined, they result in a gamified flipped classroom (GFC), associated with improved learning outcomes and more positive learning experiences.
Literature review
Gamification and flipped classroom: intersections
Gamification has gained visibility in educational research as a convergent result of technology-enhanced and student-centred learning. It involves incorporating game-design elements, such as points, badges, leaderboards, challenges and immediate feedback, into non-game contexts (Deterding et al., 2011). It is therefore a process of redesigning learning rather than a product and differs from game-based learning centred on serious games (Landers et al., 2018). Meta-analyses show small to moderate benefits for cognitive, motivational and behavioural outcomes (Sailer and Homner, 2020; Bai et al., 2020; Li et al., 2024), although effects depend on instructional design, cooperation–competition balance, task suitability and sustained implementation (Hamari et al., 2014; Khaldi et al., 2023).
The flipped classroom has become a major topic in higher education, adding interactivity to traditional lectures by replacing direct instruction with pre-class activities supported by readings, videos and quizzes, while class time is used for problem-solving, debate, collaboration and facilitation (Bergmann and Sams, 2012). Several meta-analyses and systematic reviews examined flipped classroom’s effects for various academic fields (Turan and Akdag-Cimen, 2020; Evans et al., 2019; Lo and Hew, 2017). Findings report improvements in student satisfaction, involvement and academic achievement (Bredow et al., 2021; Cheng et al., 2019; van Alten et al., 2019). However, its efficacy relies on instructional design and implementation, especially when paired with gamification, while theoretical and instructional inconsistencies still limit comparisons across studies. At the intersection of gamification and the flipped classroom, GFC emerged as a dynamic learning environment in which students prepare in advance using various resources and gamified elements are incorporated (Guerrero-Quiñonez et al., 2023). Evidence suggests that GFCs and gamified traditional classrooms enhance learners’ engagement, commitment and achievement, both outperforming traditional lecture-based approaches (Ng and Lo, 2022a, b). Recent research highlights contextual GFC effectiveness, reporting gains in academic performance, engagement and self-regulated learning (Maimaiti and Hew, 2025; El-Thalji, 2025). Emerging evidence from AI-enriched GFCs’ underscore learner variability in explaining effects on achievement (Phanwiriyarat et al., 2025). Reviews confirm benefits for cognitive and non-cognitive outcomes but emphasize the effects of instructional design, academic discipline, learning materials complexity and learner profiles (Fernández-Velásquez et al., 2025; ElGamal and Zawacki-Richter, 2025). Overall, research shifts from demonstrating GFC’s effectiveness to mapping conditions maximizing its impact.
However, GFC has its limitations: it lacks articulated theoretical grounding and contextual moderators are rather neglected (Majid et al., 2024). The literature suggests some relevant moderators for GFC’s success, at least from the gamification perspective: pre-class use of gamification, in contrast with in-class or both (Chen et al., 2023); gamification congruence with learning objectives (García Magro and Martín Peña, 2021); learner and contextual factors (Ng and Lo, 2022a, b); sustainable integration and periodic renewal of gamification into flipped activities (Rodrigues et al., 2022).
A theoretical framework for analyzing moderated GFC’s cognitive and non-cognitive effects
Recent studies resulted in mixed findings on GFC’s effectiveness, as it may depend not only on instructional design but also on underlying learning processes. The present meta-analysis relies on an integrated theoretical framework grounded in cognitive load theory (CLT; Sweller, 1988; Sweller et al., 2019), self-determination theory (SDT; Deci and Ryan, 2000; Ryan and Deci, 2017) and control-value theory of achievement emotions (CVT; Pekrun, 2006) to explain how GFC influences cognitive and non-cognitive outcomes.
CLT emphasizes working memory limits and the management of intrinsic, extraneous and germane cognitive load (Sweller, 1988; Sweller et al., 2019; Sweller, 2020). Flipped classrooms may regulate load through self-paced pre-class activities and promote germane processing during class (Abeysekera and Dawson, 2015), but poorly structured materials or overabundant multimedia can amplify extraneous load. Gamification may either reduce extraneous load through clear goals and feedback or increase it through irrelevant rules and overstimulation (Mayer, 2020). Based on SDT, motivation depends on the satisfaction of autonomy, competence and relatedness needs (Deci and Ryan, 2000; Ryan and Deci, 2017). Gamified designs may either undermine deep engagement with controlling incentives or foster intrinsic motivation when they support autonomy, competence and relatedness (Li et al., 2024). Flipped classrooms enable autonomy and competence (Avakyan and Taylor, 2024), but gamification may either strengthen or weaken these effects. CVT additionally suggests that learning outcomes are shaped by achievement emotions, arising from learners’ appraisals of perceived control and task value (Pekrun, 2006). In GFCs, gamification may enhance perceived value (Perez-Aranda et al., 2024), while flipped designs may increase perceived control (Liu, 2022).
Although pedagogical design is critical for GFC success, effectiveness also depends on aligning instructional strategies, technologies and assessment with learner profiles (Baig and Yadegaridehkordi, 2023; Youhasan et al., 2021). Consistent with SDT, gamification can enhance engagement, but its effects depend on differences in learner motivation and self-regulation (Ng and Lo, 2022a, b; Zainuddin et al., 2019). CLT and CVT further indicate that cognitive capacity, prior knowledge and emotional dispositions shape learners’ engagement and outcomes. Empirical evidence shows that learning style and proficiency influence outcomes, revealing insufficient personalization in current GFC (Elzeky et al., 2022; Pratiwi et al., 2024).
According to CLT, SDT and CVT, some heterogeneity in empirical findings on GFCs’ outcomes is anticipated, highlighting the relevance of a meta-analytic study. Previous studies considered several meaningful moderators. Gender does not typically moderate cognitive outcomes, but it affects motivation and engagement, especially for specific competitive game elements (Hanus and Fox, 2015) and gamification mechanics (Sailer et al., 2017). Intervention timing further conditions effects: pre-class activities enhance readiness and explain achievement (Chen et al., 2017), in-class active learning fosters cognitive gains (Freeman et al., 2014) and reduces achievement gaps (Theobald et al., 2020). Intervention length (Sailer and Homner, 2020; van Alten et al., 2019) and geographical location, a proxy for cultural background, may further shape GFCs’ outcomes (Baig and Yadegaridehkordi, 2023; Hamari et al., 2014). These theoretical perspectives guided the analytical framework for moderator selection (intervention placement, instructional design and contextual variables) and results discussion. Overall, CLT, SDT and CVT indicate that GFC’s efficacy depends on the alignment between cognitive demands, motivation and learners' emotional evaluations.
Previous recent meta-analyses on GFC: the research gap
Researchers examined GFC effectiveness for various outcomes. Yu and Yu's (2024) meta-analysis of 17 studies across educational levels found that GFC has an upper-medium effect on student performance compared with non-gamified flipped activities. Sun and Sailer (2025) reported GFC benefits for cognitive and non-cognitive outcomes, analyzing 79 effect sizes from 29 studies conducted at all educational levels. Although studies offer valuable insights, they also raise questions. For example, Yu and Yu (2024) did not investigate the role of gamification placement or the control groups' definitions, while Sun and Sailer’s (2025) meta-analysis limits conclusions for higher education, given the variability across educational levels, where course structures and learners’ needs differ substantially.
The present study adopts a more focused approach, covering exclusively higher education studies, where flipped and gamified designs are frequently employed in large and diverse classrooms. Beyond estimating overall effects, it also aims to highlight how different GFC implementation strategies matter. Does it work better when game elements are placed pre-class, in-class, or both? Does the intervention length matter? Are the effects stronger when compared with traditional lectures or when compared with flipped classes without gamification? Does gender composition, or geographical location/continent, make a difference? Although rarely addressed in previous reviews, these questions are highly relevant to teachers designing their courses. While the study of GFC is well-established, the present work narrows the focus to higher education and design-related moderators relevant for instructional decisions. Its contribution lies less in asking whether GFC works, and more in clarifying under which conditions it shows stronger effects.
Purpose and research questions of the study
The aim of this meta-analysis is to provide clear and practical answers regarding GFC effects. By analyzing design and contextual factors as moderators, this study goes beyond asking whether GFC is effective to explore how to optimize it for higher education. The following research questions were addressed:
What is the impact of embedding GFCs in university courses on cognitive and non-cognitive outcomes?
Which features of design and context (i.e. potential moderators) explain when and why these outcomes are stronger?
Method
Literature search
For identifying the relevant articles, we searched three databases: Scopus, Web of Science and EBSCO (EBSCOhost database). The time range for searching articles was from January 2015 until June 2025.
The following terms were used in the search process in the Web of Science database: “gamification” AND “flipped classroom” AND “higher education”; “gamified” AND “flipped classroom” AND “higher education”; “gamification” AND “flipped classroom” AND “university”; “gamified” AND “flipped classroom” AND “university”. Scopus publications search was based on the following terms and conditions: “gamification” AND “flipped classroom” AND “higher education”; “gamified” AND “flipped classroom” AND “higher education”; “gamification”. The EBSCO prompt used the following combination of terms: (gamification OR gamified) AND (“flipped classroom” OR “flipped learning” OR “flipped pedagogy”) AND (“higher education” OR university OR college OR undergraduate OR “tertiary education”). We also investigated bibliographic references of recent meta-analyses (Yu and Yu, 2024; Sun and Sailer, 2025) and added the missing studies. We identified 26 experimental or quasi-experimental studies (Figure 1) that tested GFC in university settings. From these, we extracted 23 effect sizes for cognitive outcomes and 10 effect sizes for non-cognitive outcomes. We applied a decision-rule approach (López-López et al., 2014) to select one effect size per outcome category per study (e.g. one cognitive effect, one non-cognitive effect), thereby maintaining the independence assumption and preventing undue weight of studies with multiple, correlated measures. While this approach reduces statistical dependency, it may also result in some loss of information in multidimensional interventions. Although motivation, engagement, affect and enjoyment represent distinct constructs, they were grouped into a broader non-cognitive category to ensure sufficient statistical power for meta-analytic synthesis.
The flowchart details the selection process of studies for a meta-analysis. It is divided into two main sections: Identification of studies via databases and registers, and Identification of studies via other methods. In the first section, records are identified from Web of Science, Scopus, and EBSCO databases. Duplicate records are removed before screening. The remaining records are screened based on title and abstract, and reports are sought for retrieval. Reports are then assessed for eligibility, and those excluded are listed with reasons. In the second section, records are identified from reference list searches. These reports are also sought for retrieval and assessed for eligibility, with reasons for exclusion provided. The final studies included in the meta-analysis are noted.PRISMA flow chart for the selection process of the studies
The flowchart details the selection process of studies for a meta-analysis. It is divided into two main sections: Identification of studies via databases and registers, and Identification of studies via other methods. In the first section, records are identified from Web of Science, Scopus, and EBSCO databases. Duplicate records are removed before screening. The remaining records are screened based on title and abstract, and reports are sought for retrieval. Reports are then assessed for eligibility, and those excluded are listed with reasons. In the second section, records are identified from reference list searches. These reports are also sought for retrieval and assessed for eligibility, with reasons for exclusion provided. The final studies included in the meta-analysis are noted.PRISMA flow chart for the selection process of the studies
Inclusion and exclusion criteria
To be included in the present meta-analysis, studies were required to: (a) implement a flipped classroom design in a higher education setting; (b) integrate gamification elements into the flipped model; (c) employ an experimental or quasi-experimental design with at least one control or comparison group; (d) report sufficient statistical information to calculate an effect size (e.g. means and standard deviations, sample sizes or other statistics convertible to standardized effect sizes for the post-test measurements); and (e) be published in English and peer-reviewed. All studies focussing on non-higher-education populations were excluded, as well as qualitative studies, case reports, theoretical papers and conference abstracts without full empirical data.
Data extraction
After completing the search process, the duplicate records were removed and the remaining titles/abstracts were screened for relevance. Potentially eligible full texts were reviewed for accuracy and consistency. From each study we extracted and coded into an Excel sheet the following information: outcome type (cognitive or non-cognitive), specific learning outcome details, total number of participants, proportion of female participants, geographic location/continent of the study, length of intervention (weeks), placement of gamification in the flipped design (pre-class, in-class or both) and control group type (traditional lecture vs flipped without gamification). When both traditional and flipped control groups were available, the flipped comparison was extracted to provide a conservative estimate of gamification’s added value. To evaluate inter-rater agreement, the two authors coded one-third of the studies. To ensure consistency in coding decisions, differences were discussed and resolved by consensus.
Data analysis
The present meta-analysis was conducted following the PRISMA guidelines (Page et al., 2021) and statistical analyses were performed using CMA (Comprehensive Meta-Analysis version 4, Borenstein, 2022). For each eligible study, the means and standard deviations from the post-test measurements of the experimental/intervention and control groups were extracted. For one study, we extracted the mean differences and the t-test results (Jeong and Gonzalez-Gomez, 2025). These values were used to compute standardized mean differences. We used Hedges’ g as the effect size index, since it corrects for small sample bias (Borenstein et al., 2009). Cohen’s (1988) benchmarks were used to interpret effect sizes. In situations where multiple outcomes within the same category (cognitive and/or non-cognitive) were reported in a single study, only one effect size was chosen per category to avoid the influence of any single study (López-López et al., 2014). Although this decision ensured statistical independence and prevented the overrepresentation of some studies, it may have reduced sensitivity to within-study variability and limited granularity in examining construct-specific differences. All calculations were executed by using the random-effects model, which allows for anticipated heterogeneity among studies and facilitates broader generalization (Borenstein et al., 2010). The Q statistic and the I2 index were used to measure heterogeneity. I2 values of 25%, 50% and 75% showed low, moderate and high heterogeneity, respectively (Higgins et al., 2003). To further explore variability, we conducted subgroup analyses (using mixed-effects models) for categorical moderators, such as geographic location/continent (Europe, Asia and Emerging and Developing Regions, with regions such as, South Africa, Mexico, Chile, Brazil, UAE and Egypt), gamification placement (pre-class, in-class, both), flipped format (partial vs full) and type of control group (traditional vs flipped without gamification). For continuous moderators, such as intervention length or proportion of female students, we applied random-effects meta-regression using the Z-distribution approach (Borenstein et al., 2009). Additionally, publication bias was assessed for cognitive outcomes by observing the symmetry of the funnel plot and Egger’s t-test (Egger et al., 1997).
Results
The final sample consisted of 26 studies (Table 1), conducted exclusively in higher education, covering various academic disciplines, with an experimental or quasi-experimental design and together they yielded 23 independent effect sizes for cognitive outcomes and 10 for non-cognitive outcomes, and a total of 2046 participants. Sample sizes varied considerably, ranging from small groups (N < 30 students) to large groups (N < 300 participants), with undergraduate populations most frequently studied. Cognitive outcomes were commonly measured through achievement tests and quizzes, while non-cognitive outcomes included measures of motivation and learning engagement. Most studies compared GFCs with traditional lecture-based instruction, while others used non-gamified flipped classrooms as controls, allowing us to evaluate both the effect of the flipped structure and the added value of gamification.
Characteristics of the studies included in the meta-analysis
| No. | Authors | Continent/Region | Gamification placement | Length (weeks) | Control type | Outcome |
|---|---|---|---|---|---|---|
| 1 | Ahmed and Asiksoy (2021) | Europe | pre-class and in-class | 10 | FC | cognitive |
| 2 | Arias and Esteve Mon (2025) | Europe | in-class | 6 | TC | cognitive |
| 3 | Asiksoy and Canbolat (2021) | Europe | pre-class | 9 | FC | cognitive |
| 4 | da Silva Garcia et al. (2022) | Emerging and Developing Regions | in-class | 17 | TC | cognitive |
| 5 | Durrani et al. (2022) | Emerging and Developing Regions | in-class | 14 | TC | cognitive |
| 6 | Elzeky et al. (2022) | Emerging and Developing Regions | pre-class | 8 | FC | cognitive |
| 7 | Gisbert et al. (2024) | Europe | pre-class | 1 | FC | cognitive |
| 8 | Ho (2019) | Asia | in-class | 13 | FC | cognitive |
| 9 | Hung (2016) | Asia | pre-class and in-class | 3 | FC | both |
| 10 | Hung (2018) | Asia | in-class | 6 | FC | both |
| 11 | Jeong and Gonzalez-Gomez (2025) | Emerging and Developing Regions | pre-class and in-class | 15 | FC | both |
| 12 | Kim and Kim (2022) | Asia | pre-class and in-class | 7 | TC | both |
| 13 | Maimati and Hew (2025) | Asia | pre-class and in-class | 10 | FC | both |
| 14 | Ng and Lo (2022a, b) | Asia | in-class | 4 | FC | cognitive |
| 15 | Recabarren (2021) | Emerging and Developing Regions | pre-class and in-class | 12 | FC | non-cognitive |
| 16 | Sailer and Sailer (2020) | Europe | in-class | 2 | FC | non-cognitive |
| 17 | Toriz (2019) | Emerging and Developing Regions | pre-class and in-class | 16 | FC | cognitive |
| 18 | Tsay et al. (2018) | Europe | pre-class and in-class | 12 | TC | cognitive |
| 19 | Zainuddin (2021) | Asia | in-class | 7 | FC | both |
| 20 | Zvarych (2019) | Europe | in-class | 12 | TC | cognitive |
| 21 | Hwang and Chang (2020) | Asia | in-class | 2 | FC | both |
| 22 | Huesca et al. (2023) | Emerging and Developing Regions | in-class | 16 | FC | cognitive |
| 23 | Gündüz and Akkoyunlu (2020) | Europe | in-class | 9 | FC | cognitive |
| 24 | Do et al. (2023) | Emerging and Developing Regions | in-class | 1 | FC | both |
| 25 | Taşkın and Çakmak (2022) | Emerging and Developing Regions | in-class | 12 | FC | cognitive |
| 26 | Smith et al. (2023) | Emerging and Developing Regions | in-class | 5 | FC | cognitive |
| No. | Authors | Continent/Region | Gamification placement | Length (weeks) | Control type | Outcome |
|---|---|---|---|---|---|---|
| 1 | Europe | pre-class and in-class | 10 | FC | cognitive | |
| 2 | Europe | in-class | 6 | TC | cognitive | |
| 3 | Europe | pre-class | 9 | FC | cognitive | |
| 4 | Emerging and Developing Regions | in-class | 17 | TC | cognitive | |
| 5 | Emerging and Developing Regions | in-class | 14 | TC | cognitive | |
| 6 | Emerging and Developing Regions | pre-class | 8 | FC | cognitive | |
| 7 | Europe | pre-class | 1 | FC | cognitive | |
| 8 | Asia | in-class | 13 | FC | cognitive | |
| 9 | Asia | pre-class and in-class | 3 | FC | both | |
| 10 | Asia | in-class | 6 | FC | both | |
| 11 | Emerging and Developing Regions | pre-class and in-class | 15 | FC | both | |
| 12 | Asia | pre-class and in-class | 7 | TC | both | |
| 13 | Asia | pre-class and in-class | 10 | FC | both | |
| 14 | Asia | in-class | 4 | FC | cognitive | |
| 15 | Emerging and Developing Regions | pre-class and in-class | 12 | FC | non-cognitive | |
| 16 | Europe | in-class | 2 | FC | non-cognitive | |
| 17 | Emerging and Developing Regions | pre-class and in-class | 16 | FC | cognitive | |
| 18 | Europe | pre-class and in-class | 12 | TC | cognitive | |
| 19 | Asia | in-class | 7 | FC | both | |
| 20 | Europe | in-class | 12 | TC | cognitive | |
| 21 | Asia | in-class | 2 | FC | both | |
| 22 | Emerging and Developing Regions | in-class | 16 | FC | cognitive | |
| 23 | Europe | in-class | 9 | FC | cognitive | |
| 24 | Emerging and Developing Regions | in-class | 1 | FC | both | |
| 25 | Emerging and Developing Regions | in-class | 12 | FC | cognitive | |
| 26 | Emerging and Developing Regions | in-class | 5 | FC | cognitive |
Cognitive outcomes
The random-effects model showed that GFCs had a significant positive effect on students’ academic achievement (Hedges’ g = 0.92, 95% CI (confidence intervals) [0.59, 1.24], p < 0.001). This represents a large effect, despite the high heterogeneity observed (Q(22) = 276.84, p < 0.001, I2 = 92.1%; Figure 2), indicating wide effects variation across studies. This effect reflects an average estimate across diverse conditions, not a uniform GFC effect. Table 2 presents the pooled effects for cognitive and non-cognitive outcomes.
The table presents a comparison of the effect of GFC on cognitive outcomes across various studies. It includes columns for study names, Hedges’s g, standard error, variance, lower limit, upper limit, Z-value, and p-value. The table has 27 rows and 9 columns. Notable studies include Ahmed & Asiksoy 2021 with a Hedges’s g of 0.633 and a p-value of 0.009, and Gisbert et al. 2024 with a Hedges’s g of 3.646 and a p-value of 0.000. The pooled estimate shows a Hedges’s g of 0.921 with a p-value of 0.000.Forest plot displaying pooled estimates of the effect of GFC on cognitive outcome
The table presents a comparison of the effect of GFC on cognitive outcomes across various studies. It includes columns for study names, Hedges’s g, standard error, variance, lower limit, upper limit, Z-value, and p-value. The table has 27 rows and 9 columns. Notable studies include Ahmed & Asiksoy 2021 with a Hedges’s g of 0.633 and a p-value of 0.009, and Gisbert et al. 2024 with a Hedges’s g of 3.646 and a p-value of 0.000. The pooled estimate shows a Hedges’s g of 0.921 with a p-value of 0.000.Forest plot displaying pooled estimates of the effect of GFC on cognitive outcome
Overall pooled effect sizes (Hedges’ g) for GFC interventions in higher education
| Outcome type | K | Hedges’ g | 95% CI | Z | p |
|---|---|---|---|---|---|
| Cognitive | 23 | 0.921 | [0.595, 1.247] | 5.53 | 0.000 |
| Non-cognitive | 10 | 0.735 | [0.428, 1.403] | 4.68 | 0.000 |
| Outcome type | K | Hedges’ g | 95% CI | Z | p |
|---|---|---|---|---|---|
| Cognitive | 23 | 0.921 | [0.595, 1.247] | 5.53 | 0.000 |
| Non-cognitive | 10 | 0.735 | [0.428, 1.403] | 4.68 | 0.000 |
Moderator analyses examined potential sources of variability. The funnel plot showed some asymmetry, and Egger’s regression was significant (intercept = 5.21, p = 0.038), suggesting potential small-study effects. Although the overall direction remains positive, the cognitive effect may be partly inflated by small-study effects. The high heterogeneity indicates large effect sizes variability across studies, suggesting further dependence on factors like instructional design, implementation or sample characteristics. Nevertheless, Orwin’s fail-safe N indicated that more than 1,200 missing studies with trivial effects would be needed to reduce the observed effect to non-significance, signalling that the overall effect is unlikely to be entirely explained by publication bias. Duval and Tweedie’s trim-and-fill analysis (Shi and Lin, 2019) suggested the possible addition of 6 studies, which, if imputed, adjusted the pooled effect upward, but remained large (g = 1.08) (Figure 3).
A scatter plot representing the relationship between Hedges’s g and Standard Error. The horizontal axis represents Hedges’s g, ranging from -5 to 6. The vertical axis represents Standard Error, ranging from 0.0 to 0.5. The plot includes dozens of data points, each represented by a circle. The data points are concentrated around the center, forming a funnel shape. A few outliers are visible on the right side of the plot. Two dashed lines form a funnel shape around the central data points, indicating a reference range.Funnel plot of Standard Error by Fisher’s Z
A scatter plot representing the relationship between Hedges’s g and Standard Error. The horizontal axis represents Hedges’s g, ranging from -5 to 6. The vertical axis represents Standard Error, ranging from 0.0 to 0.5. The plot includes dozens of data points, each represented by a circle. The data points are concentrated around the center, forming a funnel shape. A few outliers are visible on the right side of the plot. Two dashed lines form a funnel shape around the central data points, indicating a reference range.Funnel plot of Standard Error by Fisher’s Z
Moderator analyses
Separate meta-regressions tested continuous moderators. The percentage of female participants was used as the indicator of gender composition. The proportion of female participants was not a significant predictor of cognitive outcomes (β = −0.003, p = 0.596), and intervention length also showed no significant moderating effect (β = 0.024, p = 0.488). Table 3 shows that neither continuous moderator significantly predicted cognitive effects.
Meta-regression analyses for testing the potential moderating role of intervention length and proportion of female participants on cognitive outcome
| Moderator | K | Estimate | SE | Z | p |
|---|---|---|---|---|---|
| Intervention length (weeks) | 23 | 0.024 | 0.034 | 0.69 | 0.488 |
| Female proportion (%) | 12 | −0.003 | 0.006 | −0.53 | 0.596 |
| Moderator | K | Estimate | SE | Z | p |
|---|---|---|---|---|---|
| Intervention length (weeks) | 23 | 0.024 | 0.034 | 0.69 | 0.488 |
| Female proportion (%) | 12 | −0.003 | 0.006 | −0.53 | 0.596 |
Subgroup analyses tested categorical variables as moderators. For geographic location/continent, subgroup means differed descriptively (Emerging and Developing Regions: g = 1.266, Europe: g = 0.986, Asia: g = 0.549), but the between-group difference was not statistically significant (Q (2) = 2.963, p = 0.227). For the control group type, comparisons against flipped non-gamified controls showed a larger mean effect (g = 1.071) than comparisons against traditional lecture controls (g = 0.510). However, the between-group difference was not statistically significant (Q (1) = 2.162, p = 0.142). For cognitive outcomes, placement of gamification significantly moderated the effects (Q(2) = 6.648, p = 0.036). Pre-class gamification showed the largest effect (g = 1.793, 95% CI [0.993, 2.593], p < 0.001), followed by the combined pre- and in-class implementation (g = 1.036, 95% CI [0.395, 1.677], p = 0.002), whereas the in-class-only placement yielded the smallest, although still significant, effect (g = 0.610, 95% CI [0.173, 1.046], p = 0.006). Table 4 shows that placement significantly moderated cognitive effects, with pre-class being the highest.
Subgroup analyses for testing the potential moderating role of continent, control type and placement for cognitive outcome
| Moderator | Hedges’ g | 95% CI | p | K | Q | p |
|---|---|---|---|---|---|---|
| Continent | 2.963 | 0.227 | ||||
| Asia | 0.549 | [−0.019; 1.118] | 0.058 | 8 | ||
| Europe | 0.986 | [0.417; 1.556] | 0.001 | 8 | ||
| Emerging and developing regions | 1.266 | [0.667; 1.865] | 0.000 | 7 | ||
| Control type | 2.162 | 0.142 | ||||
| Flipped (FC) | 1.071 | [0.686; 1.457] | 0.000 | 17 | ||
| Traditional (TC) | 0.510 | [−0.131; 1.151] | 0.119 | 6 | ||
| Placement of gamification | 6.648 | 0.036 | ||||
| Pre-class | 1.793 | [0.993; 2.593] | 0.000 | 4 | ||
| In-class | 0.610 | [0.173; 1.046] | 0.006 | 13 | ||
| Both | 1.036 | [0.395; 1.677] | 0.002 | 6 |
| Moderator | Hedges’ g | 95% CI | p | K | Q | p |
|---|---|---|---|---|---|---|
| Continent | 2.963 | 0.227 | ||||
| Asia | 0.549 | [−0.019; 1.118] | 0.058 | 8 | ||
| Europe | 0.986 | [0.417; 1.556] | 0.001 | 8 | ||
| Emerging and developing regions | 1.266 | [0.667; 1.865] | 0.000 | 7 | ||
| Control type | 2.162 | 0.142 | ||||
| Flipped (FC) | 1.071 | [0.686; 1.457] | 0.000 | 17 | ||
| Traditional (TC) | 0.510 | [−0.131; 1.151] | 0.119 | 6 | ||
| Placement of gamification | 6.648 | 0.036 | ||||
| Pre-class | 1.793 | [0.993; 2.593] | 0.000 | 4 | ||
| In-class | 0.610 | [0.173; 1.046] | 0.006 | 13 | ||
| Both | 1.036 | [0.395; 1.677] | 0.002 | 6 |
Non-cognitive outcome
The pooled random-effects model revealed a significant positive effect of GFCs on motivation and engagement (Hedges’ g = 0.73, 95% CI [0.42, 1.04], p < 0.001). This represents a medium-to-large effect, suggesting relatively higher levels during the learning process (Figure 4).
The table presents a comparison of the effect of GFC on non-cognitive outcomes across multiple studies. It includes data for twelve studies, with columns for Hedges’s g, standard error, variance, lower limit, upper limit, Z-value, and p-value. Notable studies include Do et al. (2023) with a Hedges’s g of 0.362 and a p-value of 0.019, and Hwang and Chang (2020) with a Hedges’s g of 2.702 and a p-value of 0.000. The pooled estimate shows a Hedges’s g of 0.735 with a p-value of 0.000. The prediction interval ranges from -0.343 to 1.814.Forest plot: estimates and pooled estimates of the effect for the relationship between GFC and non-cognitive outcomes
The table presents a comparison of the effect of GFC on non-cognitive outcomes across multiple studies. It includes data for twelve studies, with columns for Hedges’s g, standard error, variance, lower limit, upper limit, Z-value, and p-value. Notable studies include Do et al. (2023) with a Hedges’s g of 0.362 and a p-value of 0.019, and Hwang and Chang (2020) with a Hedges’s g of 2.702 and a p-value of 0.000. The pooled estimate shows a Hedges’s g of 0.735 with a p-value of 0.000. The prediction interval ranges from -0.343 to 1.814.Forest plot: estimates and pooled estimates of the effect for the relationship between GFC and non-cognitive outcomes
Again, heterogeneity was high (Q (9) = 54.82, p < 0.001, I2 = 83.58%), suggesting that effect sizes differed meaningfully across contexts. Regarding publication bias, the visual inspection of the funnel plot did not indicate major asymmetry and Egger’s test was non-significant (intercept = 3.28, p = 0.201), indicating that publication bias is unlikely to have distorted results (Figure 5).
A funnel plot with the horizontal axis representing Hedges’s g and the vertical axis representing standard error. The plot includes several data points scattered around the central vertical line at Hedges’s g equals zero. The data points are distributed within a triangular funnel shape, with most points concentrated near the top center and fewer points as you move away from the center along the horizontal axis. There is a diamond shape at the bottom center, representing the overall effect size. The plot shows a few outliers, particularly one data point far to the right around Hedges’s g equals 3.Funnel plot of Standard Error by Fisher’s Z
A funnel plot with the horizontal axis representing Hedges’s g and the vertical axis representing standard error. The plot includes several data points scattered around the central vertical line at Hedges’s g equals zero. The data points are distributed within a triangular funnel shape, with most points concentrated near the top center and fewer points as you move away from the center along the horizontal axis. There is a diamond shape at the bottom center, representing the overall effect size. The plot shows a few outliers, particularly one data point far to the right around Hedges’s g equals 3.Funnel plot of Standard Error by Fisher’s Z
Moderator analyses
A separate meta-regression investigating the role of intervention length as a continuous moderator produced non-significant results for non-cognitive outcomes (β = −0.052, p = 0.156). The female proportion in the group was not tested due to insufficient studies providing this information. Table 5 shows that intervention length was not a significant predictor.
Meta-regression analyses for testing the potential moderating role of intervention length on non-cognitive outcome
| Moderator | k | Estimate | SE | Z | p |
|---|---|---|---|---|---|
| Intervention length (weeks) | 10 | −0.052 | 0.037 | −1.42 | 0.156 |
| Moderator | k | Estimate | SE | Z | p |
|---|---|---|---|---|---|
| Intervention length (weeks) | 10 | −0.052 | 0.037 | −1.42 | 0.156 |
Subgroup analyses for categorical moderators resulted in non-significant differences for geographic location/continent (Q (2) = 2.25, p = 0.323), although it showed variation in mean effects (g = 0.89 in Asia, g = 0.81 in Europe, g = 0.26 in Emerging and Developing Regions) and for placement of gamification (Q (1) = 2.20, p = 0.136). These findings indicate positive GFC effects but should be interpreted cautiously given high heterogeneity and the limited number of non-cognitive effect sizes. Publication bias indicators did not suggest major distortions. The control type was not included in the subgroup analyses due to insufficient studies in each group. Larger mean effects were observed when compared with traditional lectures, although the between-group difference was not statistically significant. Table 6 presents subgroup comparisons for non-cognitive outcomes.
Subgroup analyses for testing the potential moderating role of continent and placement for the non-cognitive outcome
| Moderator | Hedges’ g | 95% CI | p | k | Q | p |
|---|---|---|---|---|---|---|
| Continent | 2.257 | 0.323 | ||||
| Asia | 0.897 | [0.457; 1.336] | 0.000 | 6 | ||
| Europe | 0.815 | [0.127; 1.503] | 0.020 | 2 | ||
| Emerging and developing regions | 0.267 | [−0.438; 0.973] | 0.458 | 2 | ||
| Placement of gamification | ||||||
| In-class | 0.992 | [0.531.; 1.452] | 0.000 | 5 | 2.230 | 0.136 |
| Both | 0.507 | [0.065; 0.948] | 0.024 | 5 |
| Moderator | Hedges’ g | 95% CI | p | k | Q | p |
|---|---|---|---|---|---|---|
| Continent | 2.257 | 0.323 | ||||
| Asia | 0.897 | [0.457; 1.336] | 0.000 | 6 | ||
| Europe | 0.815 | [0.127; 1.503] | 0.020 | 2 | ||
| Emerging and developing regions | 0.267 | [−0.438; 0.973] | 0.458 | 2 | ||
| Placement of gamification | ||||||
| In-class | 0.992 | [0.531.; 1.452] | 0.000 | 5 | 2.230 | 0.136 |
| Both | 0.507 | [0.065; 0.948] | 0.024 | 5 |
Discussion
The observed cognitive effect size indicates a large but context-sensitive impact that should be interpreted in relation to the substantial heterogeneity, the presence of small-study effects and the upward-adjusted trim-and-fill results. In comparison with prior meta-analytic estimates, this result appears higher but should not be interpreted as a general impact of GFC. This pattern may also suggest the role of study characteristics, such as small sample sizes or implementation contexts, which are known to inflate effect size estimates in meta-analytic studies (Borenstein et al., 2009). The observed variability suggests that gamification operates as a context-dependent mechanism rather than a uniform outcomes enhancer. The outcomes may also depend on the alignment between cognitive load regulation, motivational support and learners’ perceived control and value, consistent with the integrated CLT–SDT–CVT framework.
Previous meta-analyses on gamification consistently report small to medium gains in motivation, affective outcomes and achievement across academic fields (Bai et al., 2020; Ho et al., 2022; Sailer and Homner, 2020). Meta-analyses on the flipped classroom show benefits when integrated into instructional design, including higher achievement (Yu and Yu, 2024; Karagöl and Esen, 2018). Some recent evidence (Sun and Sailer, 2025) indicates GFC’s moderate effects on behavioural and affective outcomes, but minimal effects on cognitive outcomes in university contexts. In contrast, Yu and Yu (2024) reported a large positive effect for university students, attributed to higher self-regulation abilities.
Aggregating K–12 and higher education data may obscure contextual differences, as university students often show greater autonomy, self-regulation and digital competence. By focussing on higher education, the current study helps clarify mixed prior findings and adds originality through moderator analysis. Findings indicate positive pooled effects, particularly improvements in cognitive performance, but they should not be interpreted as a general GFC impact, but as a possible upper-bound estimate under specific implementation conditions in higher education.
These results are broadly consistent with the growing literature indicating benefits of adding gamification into flipped classrooms, as both are associated with higher participation, engagement and autonomy (Zainuddin et al., 2019; Ng and Lo, 2022a, b). Beyond course-level instructional design, these findings also have institutional and societal implications, as the scalability of GFCs in large courses depends on adequate digital infrastructure, organizational support and alignment with graduate skill development and higher education strategy. At the institutional level, GFC implementation requires learning management systems, instructional design support, teacher training and recognition of teachers’ workload for redesigning the courses. Economically, the value lies less in gamification tools cost and more in scalable course redesign, improved student retention and transferable labour-market skills. In the context of current digital transformation in higher education, such approaches drive flexible, student-centred learning ecosystems, supporting the development of digital and self-regulated learning competencies, beyond discipline-specific gains (OECD, 2023). However, implementation raises equity concerns: social disparities result in unequal access to technology and differences in digital literacy, acknowledged during the pandemic (Williamson et al., 2020). Differences in national and institutional support may determine uneven learning opportunities and outcomes, ultimately deepening the digital divide (van Dijk, 2020). Policy-level support should ensure that digitally enhanced pedagogies reduce inequalities and promote inclusive access to high-quality education. While GFCs hold transformative potential, their effectiveness depends on pedagogical readiness and inclusive, equitable implementation, ensuring full participation of all students in digital learning environments.
The findings further suggest the largest cognitive effects for pre-class gamification, aligned with prior research grounded in CLT, emphasizing the need to avoid overstimulation during in-class activities, and to ensure time for germane cognitive processing (Mayer, 2020). Likewise, they are consistent with core premises of SDT, indicating that GFC can support autonomy and competence (Avakyan and Taylor, 2024) if gamification is coherently integrated to prevent potential undermining effects (Li et al., 2024). Similarly, perceived control may be enhanced by the flipped structure, while pre-class gamification can increase perceived task value, as evidenced in prior studies grounded in CVT (Perez-Aranda et al., 2024; Liu, 2022). For teachers, these results emphasize that placement of gamification elements within the instructional sequence reflect in different effectiveness. To enhance the classroom experience, it may be particularly beneficial to introduce pre-class gamification activities to stimulate student preparation. The effects on non-cognitive outcomes were also positive and consistent across different study designs.
Limitations
Nonetheless, several limitations should be acknowledged. First, although the meta-analysis included 26 experimental and quasi-experimental studies, the number of effect sizes for some outcomes (especially non-cognitive) was limited, affecting the power of moderator analyses. Second, the high heterogeneity observed across studies suggests that contextual factors might have an important role and should be systematically investigated. Aggregating non-cognitive outcomes may have obscured construct-specific differences, and the small number of effect sizes limits robustness and generalizability. Third, the decision to include only one effect size per outcome category per study, while it was methodologically justified, might have resulted in less detailed results. Finally, most studies used short-term interventions and self-reported outcomes, and this emphasizes the need for longitudinal studies and more objective assessments in future research.
Practical implications for instructional design in higher education
For full integration into higher education, GFC should be anchored in transformative competencies (e.g. creating values, managing tensions and assuming responsibility), as key curricular outcomes. Gamification, fostering autonomy and competence, can therefore advance current labour-market skills (Rogoz, 2024; Crețu and Rogoz, 2014). The strategic use of gamification in pre-class activities is probably the main recommendation resulting from the findings, although excessive use may reduce its motivational impact due to novelty attenuation. However, GFC implementation faces challenges, including sustaining motivation and preparation out of class, as students may skip tasks (Smith et al., 2023), as well as a greater teachers' workload, especially for developing contextual game scenarios (Hwang and Chang, 2020). Successful implementation also depends on teacher training and organizational culture aligned to similar, if not identical, values (Rogoz, 2018). In addition, scalability depends on available digital infrastructure and disciplinary differences, affecting transferability across higher education contexts.
Conclusion
Overall, these results indicate that GFC is a promising pedagogical approach, particularly under specific implementation conditions. By supporting both knowledge acquisition and students’ motivational gains, GFCs show the potential for enhancing learning in digital and student-centred educational contexts. To summarize, the GFC in university settings is associated with multiple benefits, as the literature already acknowledges. Nonetheless, the significant heterogeneity across studies suggests that effectiveness could be influenced by design quality, implementation context and learner characteristics. The present meta-analysis does not present a one-size-fits-all model, instead it highlights the benefits of theoretically informed and inclusive GFC designs.

