1. Introduction
Structural equation modeling (SEM) is a relevant quantitative method due to its capacity to analyze complex relationships between variables in a comprehensive and statistically rigorous manner. In quantitative methodology, where numerical data is used to test hypotheses and evaluate theoretical models, SEM provides a robust framework for both confirmatory and exploratory research. It allows researchers to move beyond traditional linear regression techniques by modeling multiple dependent and independent variables simultaneously, thus offering a more nuanced view of relationships within the data (Ramlall, 2016; Zyphur et al., 2023).
SEM is a versatile and powerful research method that allows researchers to examine complex relationships between observed and latent variables. Its relevance stems from the ability to test hypotheses about the structure of relationships within a theoretical model, making it particularly useful in fields such as psychology, social sciences, marketing and management (Kline, 2023). Unlike traditional regression analysis, SEM incorporates both measurement and structural models, enabling the simultaneous examination of multiple dependent relationships. A key advantage of SEM is its ability to handle latent constructs, which are not directly observable but are inferred from measured variables (Hair et al., 2021; Sterner et al., 2024). This feature is particularly relevant for researchers to model abstract concepts like self-efficacy, attitudes or customer satisfaction. SEM also provides the ability to assess the overall fit of a model, determining how well the hypothesized structure aligns with the empirical data. Furthermore, another important aspect of SEM is its flexibility in dealing with complex models that involve mediating and moderating effects, allowing for a more comprehensive exploration of causal pathways (Hopwood, 2007). This makes it especially relevant in contemporary research where relationships between variables are rarely straightforward. In addition, Rademaker et al. (2019) point out that SEM can handle measurement errors more effectively than traditional techniques, leading to more accurate and reliable results.
We can expect that SEM assumes significant relevance in innovation science studies, as it can provide a rigorous method to explore and quantify complex relationships between factors that drive innovation. As recognized by Brea (2023), Coletto et al. (2024) and Ryszko and Szafraniec (2022), innovation processes often involve multiple interconnected variables, including technological advancements, organizational capabilities, market dynamics and external environmental factors. Using SEM, researchers can examine these variables simultaneously, offering insights into both direct and indirect relationships within a comprehensive model. Furthermore, SEM allows for the integration of latent variables, which represent abstract concepts like creativity, organizational culture and innovation capacity. These constructs are difficult to measure directly, but SEM facilitates their examination through observable indicators, allowing for a more accurate analysis of their role in fostering innovation.
2. Methodology
This study has followed a comprehensive research method to explore the phases of employing a SEM to ensure the validity, accuracy and theoretical robustness of the research process. These elements are critical in a research study (Neumayer and Plümper, 2017). Employing a comprehensive approach is relevant to providing a theoretical foundation and practical insights regarding the potentialities offered by SEM. First, the theoretical foundation of SEM requires an in-depth understanding of the constructs being modeled, which can only be achieved through a rigorous literature review and hypothesis development phase. Second, it is crucial to translate complex theoretical findings into actionable, real-world applications. SEM, by nature, deals with intricate relationships between observed and latent variables, often revealing hidden patterns that are not immediately obvious through simpler statistical techniques. A well-rounded, systematic research approach helps ensure that these insights are not only statistically valid but also practically relevant.
The potential of SEM in innovation science is also explored and complemented using two case studies. This approach offers real-world examples that help clarify how SEM can be applied to investigate the complex dynamics within the innovation process. By focusing on two distinct cases, the goal is to help researchers explore variations and similarities in how SEM functions across different contexts, enhancing both theoretical and practical understanding of the model’s utility in innovation studies. Furthermore, as Takahashi and Araujo (2020) recognize, case studies provide context-specific insights that are essential for understanding the practical application of SEM in innovation science. It is important to emphasize that innovation processes are often complex and influenced by multiple factors such as organizational culture, market dynamics and technological change. Therefore, SEM is particularly useful for modeling these multi-dimensional relationships but applying it in real-world scenarios can reveal challenges and opportunities not apparent in purely theoretical models. The application of SEM for two case studies can provide a nuanced understanding of how SEM works in practice, making the findings more applicable to a wider range of real-world situations.
3. Performing structural equation modeling
Figure 1 captures the fundamental phases of the SEM. The process begins with the theoretical background and the establishment of research hypotheses. Before moving into the technical phases of SEM, it is crucial to lay a strong theoretical foundation by examining existing theories and prior studies. This helps in identifying the key constructs (variables) and the relationships between them. After that, based on the theoretical framework, the researcher formulates specific hypotheses. These hypotheses represent the expected relationships between variables, which the SEM model will test. The hypotheses could involve both direct and indirect effects, as well as mediating or moderating relationships.
In the third phase begins the technical component of SEM with the establishment of the measurement model. This phase establishes how the variables are expected to relate to each other. Both observed variables (directly measurable) and latent variables (not directly measurable but inferred from observed ones) are incorporated into this model. This phase involves drawing a path diagram that visually represents the hypothesized relationships among variables.
The fourth step involves the data-gathering process. Empirical data can be collected through surveys, experiments or other observational methods. After collecting data, it is important to clean and prepare data for analysis. This may include handling missing data, and testing for multicollinearity or normality (Cangur and Ercan, 2015). It is also fundamental to ensure the sample size is sufficient for SEM, which typically requires a relatively large data set to ensure model stability and accuracy.
The phase of estimation and evaluation is central in SEM. This phase involves computing the parameters of the model using sample data. SEM uses methods such as Maximum Likelihood Estimation (MLE), Generalized Least Squares (GLS) or Weighted Least Squares (WLS) to estimate the values of the parameters, including regression coefficients and variances of latent constructs. These estimations are based on the covariance matrix of the observed data. Once the model is estimated, the next phase is model testing. In this step, the fit of the model is evaluated to assess how well it explains the data. Researchers examine goodness-of-fit indices such as the Chi-Square Test, Confirmatory Factor Analysis (CFA), Comparative Fit Index (CFI), Composite Reliability (CR), Average Variance Extracted (AVE), Goodness of Fit Index (GFI), Tucker–Lewis Index (TLI), Standardized Root Mean Square Residual (SRMR) and Root Mean Square Error of Approximation (RMSEA). Table 1 provides an overview of the main robustness indicators. As demonstrated by Padgett and Morgan (2020) and Savalei (2014), a model with a good fit shows that the theoretical structure aligns with the empirical data. Of course, it is common the need to perform model modifications. These adjustments are made to improve the model fit (if necessary). This step involves re-specifying the model by adding or removing parameters based on theoretical and statistical considerations. After modifications, the model should be re-tested to ensure its adequacy.
Robustness indicators in SEM
| Indicator | Objective | Source |
|---|---|---|
| Average variance extracted (AVE) | Assesses the amount of variance captured by a latent variable relative to the amount due to measurement error. Minimum value of 0.5 indicates a good measure of convergent validity | Cheung et al. (2024) |
| Chi-square test | Compares the discrepancy between the observed covariance matrix and the model-implied covariance matrix. A nonsignificant chi-square indicates good model fit | McHugh (2013) |
| Comparative fit index (CFI) | Measures how well the hypothesized model fits relative to an independence model (random associations between variables). Values close to 1 indicate good fit (typically CFI ≥ 0.95) | Shi et al. (2019) |
| Composite reliability (CR) | Evaluates the extent to which the indicators of a latent construct reliably measure that construct. A value of 0.70 or higher is typically considered acceptable | Cheung et al. (2024) |
| Goodness of fit index (GFI) | Measures the proportion of variance and covariance explained by the model. Values close to 1 indicate good fit | Cho et al. (2020) |
| Root mean square error of approximation (RMSEA) | Measures the discrepancy between the observed data and the model, adjusted for model complexity. Values ≤ 0.05 indicate close fit, and ≤ 0.08 acceptable fit | Xia and Yang (2019) |
| Standardized root mean square residual (SRMR) | Assesses the average standardized residuals between observed and predicted covariance matrices. Values ≤ 0.08 indicate good fit | Pavlov et al. (2021) |
| Tucker–Lewis index (TLI) | Similar to CFI, measures improvement in fit over the null model. Values close to 1 (typically TLI ≥ 0.95) indicate good fit | Xia and Yang (2019) |
| Indicator | Objective | Source |
|---|---|---|
| Average variance extracted (AVE) | Assesses the amount of variance captured by a latent variable relative to the amount due to measurement error. Minimum value of 0.5 indicates a good measure of convergent validity | |
| Chi-square test | Compares the discrepancy between the observed covariance matrix and the model-implied covariance matrix. A nonsignificant chi-square indicates good model fit | |
| Comparative fit index (CFI) | Measures how well the hypothesized model fits relative to an independence model (random associations between variables). Values close to 1 indicate good fit (typically CFI ≥ 0.95) | |
| Composite reliability (CR) | Evaluates the extent to which the indicators of a latent construct reliably measure that construct. A value of 0.70 or higher is typically considered acceptable | |
| Goodness of fit index (GFI) | Measures the proportion of variance and covariance explained by the model. Values close to 1 indicate good fit | |
| Root mean square error of approximation (RMSEA) | Measures the discrepancy between the observed data and the model, adjusted for model complexity. Values ≤ 0.05 indicate close fit, and ≤ 0.08 acceptable fit | |
| Standardized root mean square residual (SRMR) | Assesses the average standardized residuals between observed and predicted covariance matrices. Values ≤ 0.08 indicate good fit | |
| Tucker–Lewis index (TLI) | Similar to CFI, measures improvement in fit over the null model. Values close to 1 (typically TLI ≥ 0.95) indicate good fit |
The last part is the interpretation of the results provided by SEM. The focus relies on understanding what the results mean in the context of theoretical framework and research hypotheses. Results should be interpreted in light of the original research questions and hypotheses. This involves linking the statistical findings back to the theoretical framework and explaining how the results advance or challenge existing knowledge in the field. It is also important to discuss the implications of the findings, suggest potential applications and provide recommendations for future research based on their results.
Finally, it is important to highlight the need to use software to assist researchers due to the complexity and technical requirements of the SEM. Software allows researchers to handle the complexity of the method with precision, speed and accuracy. Many SEM programs offer user-friendly graphical interfaces where researchers can visually construct path diagrams. These diagrams represent the relationships between variables and allow for easy model specification. Popular SEM software tools are AMOS, R (lavaan), LISREL, EQS, Mplus, SmartPLS and OpenMX.
4. Applications in innovation science studies
SEM is widely used in innovation science to analyze complex relationships between variables, including both latent and observed constructs. It helps modeling how factors like R&D investment, organizational culture and leadership influence innovation outcomes. SEM is particularly valuable for revealing mediating and moderating effects, such as how organizational learning mediates the relationship between knowledge management and innovation performance, or how firm size moderates the impact of innovation strategy on success.
The first case study aims to evaluate entrepreneurial intentions of students in higher education considering the specific context of health science (Chams-Anturi et al., 2024). Four hypotheses are established. Entrepreneurial intention is measured by analyzing the impact of personal attitude (H1), family role (H2), socio-cultural context (H3) and educational support (H4). No moderation or mediating factors are considered. All constructs are supported in previous literature. Reliability and validity measures were considered, such as Cronbach’s alpha, factors loading, CFA, CR, AVE, CFI, TLI and RMSEA. SEM analysis was performed using R version 4.2.2 with package lavaan. Two hypotheses (i.e. H1 and H4) are supported, while H2 and H3 are not supported.
The second case study intends to investigate sustainable business practices and corporate innovation in Pakistan (Munir and Watts, 2024). Three constructs are established: green practices, organization site and corporate innovation. First two hypotheses explore the impact of green practices and organization site on the corporate innovation. An additional hypothesis establishes moderate effects of organization size on the relationship between green practices and corporate innovation. A pilot test was conducted to ensure the validity and reliability of the measurement instruments and the proposed model before conducting a full-scale study. Several indicators were used to check the assessment of the structural model. For example, Heterotrait-Monotrait ratio of correlations (HTMT) was adopted to verify discriminant validity, variance inflation factor (VIF) to evaluate multicollinearity and CR to measure internal consistency. H1 and H3 are accepted, while H2 is rejected, which indicates that there is no effect of organization size on corporate innovation.
5. Conclusions
Understanding the phases of SEM, specifically model specification, identification, estimation, evaluation and modification, is critical for the successful application of SEM in innovation science. A clear understanding of SEM phases is crucial for accurately capturing the complexity of innovation systems. The ability to specify, estimate and refine models ensures that the findings are reliable and aligned with theoretical frameworks, ultimately advancing innovation science through more precise and actionable insights.
The two empirical case studies demonstrate that innovation science often involves multi-dimensional constructs like organizational context, technological capability, market dynamics and innovation performance. The application of SEM makes it possible to clarify and explore how these complex, interrelated factors contribute to innovation outcomes. This study also guides innovation researchers in selecting appropriate model fit indices, such as CR, CFI, RMSEA, among others, and understanding their implications. In innovation science, poor model fit may indicate that key innovation drivers have been overlooked, or that certain variables interact in unanticipated ways. Furthermore, it is also important to emphasize that theoretical models often need to be adapted to real-world conditions. The model modification phase is where researchers adjust their models to improve fit or better capture innovation dynamics. Therefore, this study also offers guidelines for making these modifications without compromising the integrity of the theoretical framework.
Finally, for practitioners in fields such as R&D, technology management and policymaking, this study explores the SEM phases in innovation science and provides a practical toolkit for assessing innovation initiatives. It offers a roadmap for applying SEM to real-world data, helping organizations evaluate the success of their innovation strategies, identify bottlenecks or predict future innovation performance.

