Case studies are a primary qualitative research method, particularly in the field of management. However, their validity and reliability have been criticized over the years. This study aims to address this issue by systematically mapping and categorizing all validity (including construct, external, internal and content) and reliability instruments used by positivist case study researchers in management.
This study used a method-based Systematic Literature that included two primary components: bibliometric analysis of the final sample and content analysis. From an initial search of 1,543 articles in Scopus and Web of Science, 155 articles were included in the final sample.
This study revealed several key findings. First, 95 instruments were used to assess validity and reliability. Second, the use of case tests has increased in recent years. Among these, reliability tests are the most commonly used, followed by construct validity, external validity, internal validity and content validity, in descending order of usage.
This study makes a significant methodological contribution to the literature. First, by inductively categorizing all instruments and case tests and grouping them into a single model, researchers can identify and map the potential uses of these tools in their studies. In addition, this study introduces a novel methodological conceptual model by consolidating the 95 instruments identified throughout the six stages of the proposed case study research. Finally, this study elaborates on and proposes the concept of “integral validity” in methodology.
Introduction
Several authors have emphasized the importance of case studies as a primary qualitative research method (Brigley, 1995; Ding and Ferràs Hernández, 2023; Ebneyamini and Sadeghi Moghadam, 2018; Eisenhardt, 1989, 2021; Rashid et al., 2019; Welch et al., 2022; Yin, 2018). For example, in her foundational work, Eisenhardt (1989) highlighted the relevance of this method for theory building. She noted that “theory developed from case study research is likely to have important strengths such as novelty, testability and empirical validity, which arise from the close connection with empirical evidence” (Eisenhardt, 1989, p. 584). More recently, Ebneyamini and Sadeghi Moghadam (2018, p. 1) argued that “there is no doubt that this research strategy is one of the most powerful methods used by researchers to achieve both practical and theoretical objectives,” particularly in exploring the complex and diverse aspects of business ethics (Brigley, 1995, p. 219). This method is used across various disciplines within the social sciences to investigate different phenomena related to individuals, groups and organizations (Ding and Ferràs Hernández, 2023, p. 2). It is regarded as an essential research method in management studies (Calandra et al., 2023; Ebneyamini and Sadeghi Moghadam, 2018; Mees-Buss et al., 2022). Indeed, case studies are widely used in the field of management and are considered valuable for analyzing and addressing management challenges (Ebneyamini and Sadeghi Moghadam, 2018, p. 8).
Despite its advantages, the qualitative case study method faces several criticisms. These include concerns about rigor (Amadi, 2023; Ding and Ferràs Hernández, 2023; Gioia et al., 2013; Gunn et al., 2020; Massaro et al., 2019), bias, low credibility, vague definitions of study concepts, limited generalizability, lack of standardization in reporting findings, insufficient scholarly rigor and an emphasis on results rather than the processes used to obtain them (Amadi, 2023; Ding and Ferràs Hernández, 2023; Gioia et al., 2013; Ito et al., 2024; Mdleleni, 2022; Peña Häufler et al., 2021; Rousso et al., 2023; Vanpoucke and Klassen, 2023). Ding and Ferràs Hernández (2023, p. 3) highlighted two significant challenges in case study research: credibility and generalizability. Mdleleni (2022) also noted that despite the method’s benefits, its reliability and validity are often questioned. Therefore, it is crucial to enhance our understanding of how qualitative case study researchers within management strive to improve the quality of their studies by using instruments that ensure validity and reliability.
Moreover, it is crucial to examine whether the use of research instruments and case studies has evolved, including identifying prominent management journals that use these instruments and key authors and significant articles on the subject. Furthermore, is it possible to develop a novel methodological conceptual model that incorporates valid and reliable instruments applicable during the research phase? Mdleleni (2022, p. 129) notes that “there is no single, coherent set of validity and reliability tests for each research phase in case study research available in the literature.” Such inquiries have become increasingly important. Although “most business researchers are trained extensively in quantitative methods compared to qualitative methods” owing to the enduring importance of quantitative research, there has been a rising number of studies using case studies in the management field (Rashid et al., 2019). This study aimed to address the following research questions (RQ):
Which types of case tests (Content, Construct, External, Internal and/or reliability) are most commonly used by authors conducting qualitative case study research in management?
How has the use of case tests evolved in the literature? Which management journals incorporate case tests, and what quartiles do they belong to? Within the “Business, Management and Accounting” category, which thematic areas use validity and reliability tests the most? Who are the primary authors and what key articles address this topic?
How can validity and reliability instruments be categorized into a novel methodological conceptual model organized by the research stages of case studies?
A method-based systematic literature review was deemed the most suitable approach to achieve the study objectives (Palmatier et al., 2018; Paul and Criado, 2020). “As noted, method-based review articles synthesize and extend the body of literature that uses an underlying method” (Palmatier et al., 2018, p. 3). This study began with an initial sample of 1,543 articles and conducted an inductive analysis to categorize and group the validity and reliability of the instruments found in 155 published articles sourced from Scopus and Web of Science, which formed the final sample for this research. Inductive analysis involved direct engagement with the data to effectively organize and code it (Gioia et al., 2013; Locke et al., 2022). While systematic reviews of this nature are common, with several studies on validity and reliability instruments in case study research, including one from 2024 (Burnard, 2024; Gibbert et al., 2008; Gibbert and Ruigrok, 2010), none have addressed the critical aspects that this study considers. For instance, Burnard (2024) limited his analysis to 2008–2018 and focused exclusively on management and operations research within 15 journals on the ABS list. In contrast, this study examined all published articles regardless of the publication period, journal or subject area. Furthermore, it inductively analyzes all the instruments discussed in the final sample articles, identifying new and distinct instruments and tests not included in Burnard’s (2024) primary deductive analysis. In addition, Gibbert et al. (2008) and Gibbert and Ruigrok (2010) only analyzed tests published between 1995 and 2000 in ten influential journals.
However, while method-based reviews have recently become increasingly popular in various fields, there is a notable lack of studies in the area of business administration (Paul and Criado, 2020). According to Paul and Criado (2020, p. 3), “there are still great opportunities for developing such method-based review articles.” This review is valuable in several ways. For instance, it inductively identifies and categorizes 95 instruments for assessing validity (including content, construct, external and internal validity) and reliability, thereby expanding the organized knowledge currently available in the field and benefiting case study researchers. In addition, this study maps the temporal evolution of case tests across leading journals and thematic areas, including quartiles, authors and articles that use these case tests and instruments, allowing readers to identify the most relevant resources for their research. Furthermore, this article introduces a novel methodological conceptual model that organizes 95 validity and reliability instruments into six stages of complementary research. This framework helps readers discern the instruments applicable to the various stages of research execution. Finally, this review proposes a new methodological concept called integral validity, which aims to highlight the most significant qualitative case studies.
Background literature
Several studies have underscored the significance of conducting validity and reliability tests in case study research (Amadi, 2023; Eisenhardt, 2021; Glavee-Geo and Engelseth, 2018; Mdleleni, 2022; Yin, 2018). Amadi (2023) highlighted that these tests address the criticisms regarding subjectivity in qualitative research. Mdleleni (2022, p. 129) stressed that these tests are crucial for assessing data stability and quality. However, there is a lack of consensus on the terminology and tests used in case studies (Solimun and Fernandes, 2017). Various definitions of validity exist, including criterion, concurrent, convergent, discriminant, measurement and analytical validity (Amadi, 2023; Miller and Simmering, 2023; Rönkkö and Cho, 2022; Solimun and Fernandes, 2017). In fact, qualitative researchers have used over 50 different terms to refer to validity (Li and Ross, 2021, p. 386).
The terminology and criteria used in case studies differ depending on the authors’ perspectives and their epistemological and methodological paradigms (Kappal and Mishra, 2023; Masele, 2024; Pratt et al., 2022; van der Walt and Myres, 2024). For instance, positivist researchers adhere to Yin’s guidelines, emphasizing internal, external and construct validity, as well as reliability (Pratt et al., 2022, p. 219; Esrar et al., 2023; Massaro et al., 2019; Matsson, 2023). However, “reliability and validity in qualitative research are also measured in different ways in qualitative research” (Ladyshewsky, 2010, p. 299). In contrast, researchers following realism, constructivism or critical studies use terms such as credibility, confirmability, dependability and transferability (Jadhav et al., 2020; Kappal and Mishra, 2023; Masele, 2024; Reddy, 2015; Senshaw and Twinomurinzi, 2022; van der Walt and Myres, 2024). Credibility refers to confidence in the truthfulness of the study’s findings, whereas confirmability indicates the extent to which the findings are consistent and can be replicated. Dependability relates to the stability of the data throughout the study, and transferability refers to the applicability of the findings to different contexts and scenarios (Senshaw and Twinomurinzi, 2022). These “four criteria parallel reliability and validity support trustworthiness in the data to offer qualitative rigour” (van der Walt and Myres, 2024, p. 5).
This study adopts a positivist perspective, which assumes that there is a single, objective organizational reality that can be understood and explained (Zilber and Zanoni, 2022, pp. 380–381). According to Street and Ward (2012, p. 162), positivist researchers emphasize the importance of accuracy and precision, which they refer to as validity and reliability. Internal validity is particularly relevant for explanatory case studies that aim to establish causal relationships, ensuring the results are free from bias, measurement errors and selection errors (Esrar et al., 2023; Reddy, 2015; Vlachos and Malindretos, 2023). This relates to the extent to which there are no alternative explanations for the phenomenon being studied (Eisenhardt, 2021; Vlachos, 2023; Yin, 2018). “Addressing alternative explanations sharpens the theoretical arguments and strengthens confidence in the internal validity of the theory” (Eisenhardt, 2021, p. 152). However, it is important to note that internal validity is not a quality test for exploratory or descriptive case studies (El Baz and Laguir, 2017; Laguir et al., 2019; Zeng et al., 2023).
External validity is essential in case studies because it relates to the generalizability of the findings (Amadi, 2023). While quantitative studies aim for statistical generalization, qualitative case studies focus on analytical generalization, connecting the findings to the existing literature (Vlachos and Malindretos, 2023; Vlachos et al., 2023; Yin, 2018). External validity refers to the extent to which data can be applied to broader contexts (Esrar et al., 2023, p. 2469). In contrast, construct validity involves what should be measured and how accurately those measurements reflect the constructs and variables in question (Dohale et al., 2022; Vlachos and Malindretos, 2023). It examines the degree to which the operationalization of variables aligns with their theoretical meaning (Vlachos et al., 2023; Yin, 2018). The “construct validation [is] similar to hypothesis-testing” (Eisenhardt, 1989, p. 532).
Reliability refers to the consistency of the data, ensuring that other researchers can arrive at the same conclusions if they conduct the same study (Eisenhardt, 2021; Vlachos, 2023; Vlachos and Malindretos, 2023; Zeng et al., 2023). It addresses the extent to which results can be replicated (Esrar et al., 2023). According to Yin (2018), reliability involves demonstrating that study operations, such as data collection procedures, can be repeated to produce the same results. The emphasis is on reexamining the same case rather than replicating the study in different contexts (Yin, 2018). Content validity, on the other hand, concerns the degree to which an instrument measures a construct of interest (Solimun and Fernandes, 2017). For example, it evaluates whether the questions accurately represent attitudes toward a training program (Solimun and Fernandes, 2017) and whether the measurement items encompass all relevant aspects while excluding anything irrelevant (Sureeyatanapas et al., 2015).
Method
This study uses a systematic literature review method to synthesize both theoretical and methodological literature (Corrêa et al., 2022; Palmatier et al., 2018; Paul and Criado, 2020). Paul and Criado (2020, p. 2) highlighted the limited availability of well-designed method-based reviews in respected journals. Such reviews aim to synthesize and expand knowledge of qualitative and quantitative methods. This study aims to describe the current state of research on the validity and reliability of instruments, as well as case tests, while addressing existing gaps and contributing to the future development of qualitative case study research (Kraus et al., 2023). In this context, the term “instrument” refers to any initiative undertaken by the researcher to enhance the validity (including content, construct, external or internal validity) and reliability of the research (Solimun and Fernandes, 2017; Yin, 2018).
Search strategy
The literature presents varying perspectives on the stages involved in conducting a systematic review. This study followed three recognized stages: planning, conducting and reporting the review (Kraus et al., 2020, 2022; Tranfield et al., 2003). In the planning stage, the authors identified the necessity for a systematic review of reliability and validity tests in case study research and developed RQs along with a study protocol (Boersma and Bedford, 2023; Kappal and Mishra, 2023). Kraus et al. (2020) emphasized the importance of having a transparent process, supported by a well-defined research protocol. During the conducting stage, the authors defined the search terms, established the inclusion and exclusion criteria and conducted a search for relevant studies. Finally, they performed a descriptive and inductive thematic analysis of the findings (Corrêa et al., 2024).
Research protocol
Table 1 presents our research protocol. While some authors recommend incorporating grey literature for a more comprehensive coverage (Behl et al., 2022; Haddaway et al., 2015), this study focused solely on peer-reviewed articles from Scopus (Elsevier®) and Web of Science (WoS) (Clarivate®) due to their extensive journal coverage and reliability for systematic reviews (Corrêa et al., 2022; Mariani, Al-Sultan, et al., 2023; Mariani, Hashemi, et al., 2023; Piccarozzi et al., 2022; Vishwakarma et al., 2023, p. 3). The authors used broad search terms, including “case stud*” in the titles, abstracts or keywords (for Scopus) and in the titles, abstracts, keywords or keywords plus (for WoS). They included papers with terms associated with validity and reliability testing. For instance, articles citing “construct validity,” “internal validity” or “external validity” were included, while the term “reliability” was excluded to avoid irrelevant quantitative studies from being included. Furthermore, the authors searched for articles that contained the phrases “reliability and validity” or “validity and reliability” in the text.
Search process
In early July 2024, databases were searched using the terms listed in Table 1. The authors did not impose any date restrictions and included articles published in all years. They focused exclusively on articles in English, excluding books, editorials and other publications. For Scopus, papers in the categories of “Business, Management and Accounting,” were filtered, while in WoS, articles from journals indexed under Management, Public Environmental Occupational Health, Business, Business Finance or Public Administration were selected. The initial search yielded 1,543 articles: 1,450 from Scopus and 93 from WoS. After removing 78 duplicates, 1,465 articles remained in the initial search.
Inclusion and exclusion criteria
The authors established five inclusion and six exclusion criteria for the study. Among the inclusion criteria, they focused on theoretical and empirical studies that used case studies as qualitative research methods (Criterion 1). Theoretical essays, systematic reviews, quantitative articles, purely quantitative case studies and studies using other methods were excluded. Only articles written in English that acknowledged at least one validity or reliability test were considered (Criterion 2). The authors selected positivist case studies while excluding nonpositivist works (Bille and Hendriksen, 2023; Esrar et al., 2023; Massaro et al., 2019) (Criterion 3). In addition, only articles written by authors who recognized the use of at least one validity or reliability instrument were included (Criterion 4). Finally, mixed-methods studies were accepted, provided that one of the methods used was a positivist qualitative case study (Criterion 5). We applied six exclusion criteria in our research:
Empirical theoretical articles that did not use case studies as qualitative research methods, including systematic literature review, quantitative studies, theoretical essays and similar works.
Studies not published in English.
Articles without accessible databases were excluded.
Articles from conference proceedings and gray literature.
Articles that did not acknowledge the use of at least one validity or reliability test.
Purely quantitative case study.
Refinement process
A total of 1,450 articles were identified after the initial search was conducted. The authors reviewed the titles, abstracts, keywords and methodology sections to assess the key factors. For instance, 220 articles referenced “Case Study” but did not meet the criteria for qualitative research methods (exclusion criteria 1 and 6). Specifically, the exclusions included:
120 quantitative studies that used “case study” merely as an example of a regional context (exclusion criterion 6);
22 theoretical essays;
11 systematic reviews;
five qualitative comparative analysis (QCA) articles;
five experiments;
four grounded theory articles;
four articles proposing “case study*” for future research; and
49 other studies, which encompassed portraits, nonpositivist case studies, ethnographic studies, quasi-experimental research, documentary studies, simulations, phenomenological studies and action research published in non-English languages.
In addition, various factors contributed to the exclusion of articles from the final sample. For example, 187 articles required payment for access (exclusion criterion 3) to the full text. Other articles were excluded because the search terms appeared only in the reference titles (exclusion criterion 5):
429 articles had “Reliability and Validity” only in references.
259 articles had “Validity and Reliability” only in references.
186 articles had “Construct Validity” only in references.
23 articles had “External validity” only in references.
Six articles had “Internal validity” only in references.
After reviewing the articles and evaluating the inclusion and exclusion criteria, 155 articles were included in the final sample. A complete list of these articles can be found at the following permanent link: https://doi.org/10.5281/zenodo.14680846. Figure 1 illustrates the process used to refine the systematic review of the articles included in this study.
Data analysis
The articles were analyzed using deductive and inductive methods (Hiebl, 2023). One of the study’s authors reviewed the Methods section of 155 articles to identify and categorize the instruments used for validity and reliability testing. This categorization process incorporated both deductive and inductive approaches to data analysis. Initially, all instruments recognized by the authors as validity and/or reliability tests that contributed to improving research quality were classified. For instance, Forslund et al. (2009) noted that they developed a study protocol to enhance the construct validity of their research. Even well-known instruments were included if the authors highlighted their significance for the study’s validity or reliability. For example, “elaborating the conceptual framework based on literature and the researchers’ experience” was documented. This process led to the identification of an additional validity test not included in the original search terms: content validity testing. Consequently, the authors categorized all instruments aimed at improving study quality. Subsequently, an inductive categorization of the instruments was conducted to identify additional testing instruments not recognized by the study authors but still contributing to enhancing research validity and reliability. To achieve this, every initiative highlighted by the authors in the Methods section of the articles was thematically categorized (Naeem et al., 2023), following an inductive-thematic logic of “statements/quotations,” which were later refined into code names assigned for this analysis (Naeem et al., 2023).
The process of categorization and recategorization was iterative throughout the analysis of the articles (Boersma and Bedford, 2023; Frangeskou et al., 2024; Li et al., 2024a, 2024b; Naeem et al., 2023; Ng and Hamilton, 2024). Each validity and reliability instrument generated unique records for content validity, construct validity, external validity, internal validity and reliability. For instance, Ruppenthal and Rückert-John (2024) produced 12 records related to eight reliability instruments, one construct validity instrument and three external validity instruments. This initial categorization process yielded 1,294 records representing 112 instruments with varying levels of validity and reliability. This categorization created a hierarchical structure with three levels. The second level decomposes the first, and the third level further decomposes the second. For example, the action of “carrying out triangulation” (secondary categorization) enhances the construct validity (primary categorization). The different types of triangulation identified include data, researcher, methodological and theoretical triangulation (tertiary categorization) (Ancillai and Pascucci, 2023; Esrar et al., 2023; Boersma and Bedford, 2023; Kappal and Mishra, 2023; Park and Kim, 2023; Damasceno et al., 2023; Guo et al., 2024).
The allocation of instruments and their details to the case tests, focusing on validity and reliability, was based on deductive-inductive analysis. Initially, the researchers assessed how the authors of the studies in the final sample assigned instruments to the case tests. Next, analytical generalization (Yin, 2018) was used based on the theoretical background of validity and reliability to allocate the inductively categorized instruments. The same occurred with the process of allocating the instruments to the stages in which they were assigned. The authors generalized the instruments to those moments in which the researchers can use them, duplicating the instruments on occasions when they can be used in more than one research stage. After the initial categorization, which was conducted thematically (Naeem et al., 2023), two authors of the study engaged in researcher triangulation. They individually regrouped the 112 identified instruments and compared their processes. Following sessions aimed at achieving intercoder agreement (De Rooij, 2015), the two authors confirmed 95 instruments suitable for assessing validity and reliability (Boersma and Bedford, 2023; Monticelli et al., 2023). The instruments were labeled with verbs in the infinitive form, indicating the actions to be performed. Table 2 displays all the case tests, identified instruments, and details of each instrument, including its codes. Additionally, the full version of Table 2, including the authors who used the instruments, is available as supplementary material.
Findings and discussion
RQ1: Which types of case tests (Content, Construct, External, Internal and/or reliability) are most commonly used by authors conducting qualitative case study research in management?
Reliability was the most commonly used test case among management case study researchers, accounting for 40% of the instruments (n = 522). This was followed by construct validity (28%, n = 363), external validity (22%, n = 285), internal validity (5%, n = 64) and content validity (5%, n = 60). Table 2 provides details about the instruments used in these tests and the authors who used them in their studies.
RQ2: How has the use of case tests evolved in the literature? Which management journals incorporate case tests, and what quartiles do they belong to? Within the “Business, Management and Accounting” category, which thematic areas use validity and reliability tests the most? Who are the primary authors and what key articles address this topic? (Figure 2)
Between 1986 and 1990, only one instrument was related to construct validity. However, from 1991 to 1996, the number of records on construct validity increased (n = 4), marking the first appearance of instruments related to external validity (n = 1) and reliability (n = 5) in the sample. Notably, there were no instruments related to validity between 1997 and 2001. The period from 2002 to 2007 was significant as it was the first time records for all validity types – including content, construct, external and internal validity, as well as reliability – were documented. A remarkable growth was observed in instruments related to external validity, which increased from one record in the 1991–1996 period to 11 records between 2002 and 2007, representing a 1,100% increase. In addition, during this time, there were substantial increases in reliability (480%) and construct validity (425%) compared to the period from 1997 to 2001. Internal validity and content validity also appeared for the first time, with two and one records, respectively. Overall, from 2002 to 2007, instruments measuring external validity, reliability, construct validity, internal validity and content validity began to gain importance in qualitative case studies, highlighting the emerging recognition of quality criteria in this field.
From 2008 to 2012, all case tests demonstrated growth, particularly in content validity, which saw a remarkable 600% increase (n = 6) compared to the previous period. This was followed by internal validity, with a growth of 450% (n = 9), external validity at 327% (n = 36), construct validity at 300% (n = 51) and reliability at 288% (n = 69). However, despite this growth, when considering the combined records from the two preceding periods (2002–2007 and 2008–2012), none of the case tests exceeded 20% of the total records for the same tests over the entire period. Specifically, construct validity accounted for 19%, reliability for 18%, internal validity for 17%, external validity for 16% and content validity for 12%. This indicates that more than 80% of the records related to these instruments are associated with the most recent 12 years (2013–2024), highlighting the developing nature of instruments and tests in case study research. Notably, between 2013 and 2018, the number of records related to these instruments grew significantly, with content validity showing the highest increase, followed by external validity, reliability, construct validity and internal validity. The period from 2019 to 2024 is particularly noteworthy, as over 50% of all case test records were concentrated within these years, indicating substantial relevance and growth during this timeframe. It is also worth noting that the authors conducted their database searches in mid-July 2024, which means they only captured part of that year, suggesting that the number of records could further enhance the findings for the 2019–2024 period.
Case tests’ leading journals
The final sample consisted of 155 articles published in 102 different journals. Table 3 highlights the leading journals that published at least two articles. For a complete list of all the journals included in the final sample, including those that published only one article, please refer to the permanent link provided via the following DOI: https://doi.org/10.5281/zenodo.13629434.
The final sample comprised two leading journals: the International Journal of Operations and Production Management and Production Planning and Control, each containing nine articles. The Journal of Business and Industrial Marketing contributed the third highest number of articles. Notably, 26 journals accounted for only 25% of the total (n = 102), yet they collectively represented 50% of all articles in the final sample, indicating a significant concentration of case studies in a small number of journals. This analysis is further detailed in Figure 3, which displays the number of records obtained over time, along with a comparison of the quartile rankings of the journals.
The classification of journals is based on quartiles from the Scopus Database. Among the 102 journals evaluated, three were not indexed in the Elsevier® database and were consequently classified according to the quartiles from the Journal Citation Report (JCR) of the WoS. These journals include the African Journal of Business Ethics [Q4], Journal of Information Technology [Q1] and Management Research: The Journal of the Iberoamerican Academy of Management [Q3]. Overall, 78% of the 1,294 records related to validity and reliability instruments (n = 1,015) were published in journals that fell within the first quartile of Scopus [Q1]. This suggests a significant concentration of quality tests in case studies published in high-impact journals. Quartiles 1 and 2 together account for 94% of all records (n = 1,210) about content, construct, external, internal and reliability validity tests. Only 7% (n = 84) of the records were found in journals categorized as Q3, 5% (n = 64) in Q3 and 2% (n = 20) in Q4 journals. Regarding records related to Q1 journals, the first validity and reliability tests were published between 1991 and 1996, totaling ten records: five related to reliability, four to construct validity and one to external validity. The highest concentration of quality tests was observed in the last three periods (2008–2012, 2013–2018 and 2019–2024), which collectively represented 95% of all validity and reliability records in Q1 journals.
Between 1991 and 1996, the authors of Q1 journals began using validity and reliability instruments more intensively. In contrast, studies published in journals from other quartiles adopted these instruments later. Notably, there is an isolated instance of a construct validity instrument being used in a Q2 journal between 1986 and 1990. However, it was only from 2002 to 2007 and, more notably, from 2013 to 2018 that we began to see a significant use of validity and reliability instruments in Q2 journals in the field. Almost 90% of the records for these instruments in Q2 journals were published between 2013 and 2024. For Q3 and Q4 journals, the adoption of validity and reliability instruments occurred even later. The first records of instrument use in these journals were noted between 2013 and 2018. Specifically, for Q4 journals, all instances of instrument usage were documented exclusively from 2013 to 2018, with no new records appearing in the more recent timeline from 2019 to 2024.
Validity and reliability tests by journals’ thematic categories
Table 4 displays the number of records according to the Scopus thematic categories of the indexed journals. Three journals included in the final sample are not indexed in Elsevier®: the African Journal of Business Ethics, the Journal of Information Technology and Management Research: The Journal of the Iberoamerican Academy of Management. The categories for these journals were determined inductively based on their aims and scope: the African Journal of Business Ethics is categorized under Business, Management and Accounting – Miscellaneous; the Journal of Information Technology falls under Management Information Systems; and Management Research is classified under both Business, Management and Accounting – Miscellaneous and General Business, Management and Accounting. Each of the 1,294 instrument records was categorized based on the journal subject areas. If a journal was indexed in multiple areas, the instrument record was assigned to each relevant category of the journal.
In the thematic area of Business, Management and Accounting, the category of Strategy and Management stands out with the highest number of records, totaling 540 (26% of the overall total). This was followed by Business and International Management, with 419 records (20% of the total) and Management of Technology and Innovation, with 285 records (14% of the total). Other categories included General Business, Management and Accounting with 170 records (8%); Marketing with 152 records (7%); miscellaneous Business, Management and Accounting with 147 records (7%); Organizational Behavior and Human Resource Management with 115 records (5%); Tourism, Leisure and Hospitality Management with 82 records (4%); Accounting with 77 records (4%); Management Information Systems with 63 records (3%); and Industrial Relations with 47 records (2%). Focusing on the three primary categories – Strategy and Management, Business and International Management and Management of Technology and Innovation – these areas accounted for nearly 60% of all records, totaling 1,244 records. This highlights the importance of case study research and the application of validity and reliability instruments in these contexts. The Strategy and Management category led in the number of registrations for all types of validity (including content, construct, external and internal) and reliability instruments, closely followed by Business and International Management in second place. However, a shift occurred starting from the third subject area. For example, while Marketing ranks third in the number of records related to content validity instruments, the Management of Technology and Innovation occupies the third position when considering other types of instruments (construct validity, external validity, internal validity and reliability).
Main authors and articles
A total of 275 authors contributed to the 155 articles included in the final sample, resulting in an average of 1.8 authors per article. Among these authors, only two – Laguir, I. and Krempl, R. – stood out, with four and two publications, respectively. Of the 155 articles, only 10 (6%) used instruments that addressed all forms of validity (content, construct, external and internal) and reliability tests (Alshurafat et al., 2020; Gong et al., 2024; Hawke and Heffernan, 2006; Laya et al., 2016; Li et al., 2024a, 2024b; Liu et al., 2021a; Najmaei, 2016; Olsson et al., 2021; Rai et al., 2021; Zeng et al., 2023). These ten articles included at least one instrument for each type of validity and reliability. In addition, 50 articles used instruments that represented at least four validity and reliability tests (DOI: 10.5281/zenodo.13629561), while 62 articles used instruments from three different tests (DOI: 10.5281/zenodo.13629567). The remaining 33 articles used instruments from only one or two tests, as shown in Table 5. Authors who worked on the same test, such as construct validity, often used one or more instruments related to that specific test.
RQ3: How can validity and reliability instruments be categorized into a novel methodological conceptual model organized by the research stages of case studies?
Several authors have proposed various stages for conducting case study research (El Baz et al., 2018; Cassanego Júnior et al., 2019; Korkmaz et al., 2011; Krichanchai and MacCarthy, 2017; Laguir et al., 2019; Liu et al., 2021b; Najmaei, 2016; Rai et al., 2021; Song et al., 2017), but there is no consensus on the exact steps involved in the process. Laguir et al. (2019) and El Baz et al. (2018) identified four stages: research design, interview protocol, data collection and analysis (including assessment or interpretation). Rai et al. (2021), Liu et al. (2021b) and Najmaei (2016) also delineated four stages but used different terminology: Rai et al. referred to them as research design, data collection, data analysis and composition; Liu, Wei, et al. described them as research plan, research design, data collection and case analysis; and Najmaei identified them as research design, case selection, data gathering and data analysis. Other authors have suggested only three stages: define and design, data collection and analysis and synthesis (Cassanego Júnior et al., 2019), or simply research design, data collection and data analysis (Korkmaz et al., 2011; Krichanchai and MacCarthy, 2017; Song et al., 2017).
This study uses a novel structure that integrates the stages proposed by previous authors with inductive categorization, suggesting new applications for instruments of validity and reliability. The proposed phases of case study research are as follows: Research Design (Korkmaz et al., 2011; Krichanchai and MacCarthy, 2017; Liu et al., 2021b; Najmaei, 2016; Rai et al., 2021; Song et al., 2017); Case Selection (El Baz et al., 2018; Laguir et al., 2019; Najmaei, 2016); Research Protocol (El Baz et al., 2018; Laguir et al., 2019); Data Collection (El Baz et al., 2018; Cassanego Júnior et al., 2019; Korkmaz et al., 2011; Krichanchai and MacCarthy, 2017; Laguir et al., 2019; Liu et al., 2021b; Rai et al., 2021; Song et al., 2017); Data Analysis (Korkmaz et al., 2011; Krichanchai and MacCarthy, 2017; Najmaei, 2016; Rai et al., 2021; Song et al., 2017) and Case Report (inductive thematic category), termed “composition” by Rai et al. (2021). The sixth stage is critical because of the various validity and reliability instruments that can be applied, as well as the lack of standardization in research reports, which is a vital criterion for both validity and reliability (Rousso et al., 2023). Table 6 presents a conceptual methodological framework that categorizes all identified instruments into the six proposed phases of this study. Instruments were assigned to multiple phases when applicable. In addition, we have made Table 6 available in an alternative format, presenting the instruments by their titles instead of codes, which facilitates easier identification by the reader. This alternative format can also be accessed through the DOIhttps://doi.org/10.5281/zenodo.16320858.
Most instruments for assessing validity and reliability were used during the data analysis stage (n = 51, 34%), followed by data collection (n = 35, 23%), research design (n = 25, 16%), research protocol (n = 21, 14%), case reports (n = 13, 9%) and case selection (n = 7, 5%). The top three stages—data analysis, data collection and research design – accounted for over 70% of the instruments, underscoring their significance. In the research design phase, content validity and external validity were the primary instruments used, with each applied six times (24%). In the case selection stage, emphasis was placed on external validity and reliability (29% of studies). The research protocol stage saw content validity (n = 7, 33%) and reliability (n = 6, 29%) as the most prominent. Data collection involved the highest number of reliability (n = 14, 40%) and construct validity (n = 8, 23%) tests. Within data analysis, reliability was particularly emphasized (n = 22, 43%), along with internal validity (n = 15, 29%). For case reports, internal validity was the most common (n = 4, 31%), followed by reliability and construct validity, each with three instances (23%). Each type of test was used at different stages of the research, reflecting its relevance. For instance, content validity was prevalent in the research protocol stage, whereas construct validity was more frequently used in the research design phase. In addition, external validity, internal validity and reliability were highlighted in the case selection, case report and data analysis stages, respectively.
When examining the various stages of research, the studies in the sample predominantly used validity and reliability instruments that are particularly relevant to case studies. For instance, at the “research design” stage, the most commonly used instrument was the plan to use multiple cases (EV2), which had 60 mentions, followed by the intention to “carry out an iterative process between data, concepts and coding” (RE3) and the plan to “use replication logic” in multiple cases (EV3A and EV3B), with 59 and 46 mentions, respectively. When researchers select cases for their studies, the most commonly used methods include using multiple cases (EV2) and seeking expert support, as noted in 27 records. In addition, a detailed description of the data collection and analysis processes (RE10) was referenced in 21 records. During the research protocol phase, researchers primarily focused on creating a structured research protocol (RE2), which appeared in 64 records. They also emphasized the importance of using replication logic (EV3A and EV3B), noted in 46 records and developing a study database (RE5), mentioned in 30 records. In the data collection stage, researchers primarily used triangulation, particularly of data (COV1A), followed by the incorporation of multiple sources of evidence (COV2) with 94 records. Recording interviews (RE1) was noted in 75 records as another significant method. In the data analysis phase, the researchers primarily aimed to conduct researcher triangulation (COV1B) and methodological triangulation (COV1C). This was followed by the pursuit of analytical generalization (EV1), which involved reviewing 74 records and engaging in an iterative process that connected data, concepts and coding (RE3), referencing 59 records in this phase. Furthermore, in the case report, the researchers again focused on analytical generalization (EV1). They also had key informants review the draft (COV3A and COV3B) and established a chain of evidence (COV4), which involved 27 records. One notable aspect of this research was the initiative “Ask experts for support,” which appeared in various forms (CV1A to CV1J) across all six stages of the case study research process, highlighting its importance in enhancing the quality and validity of the research.
Conclusion
Methodological implications
This study makes significant contributions to the literature and methodology, as highlighted by five key points. First, it inductively identifies and categorizes all validity instruments – such as content, construct, external and internal validity – and reliability tools used by qualitative case study researchers in the management field. By providing a comprehensive inventory of these instruments, this study enables future researchers to select the most suitable instruments to enhance the quality of research. In addition, it helps students and researchers understand the differences among these tools and explore the underused opportunities for improving research quality. Inductive mapping plays a crucial role, as authors often use more instruments than they explicitly cite in their studies. For instance, Dooley et al. (2013) mentioned reverting to case reports for review to enhance validity and reliability; however, they also used other quality initiatives, such as “recording interviews” (which contributes to external validity) and “taking field notes for unrecorded interviews” (related to reliability), which the authors did not cite.
The second contribution is the development of a conceptual and methodological framework that organizes all identified validity and reliability instruments across the six proposed research stages. This framework helps researchers identify relevant instruments for each stage, facilitating more structured planning and enhancing the validity and reliability. In addition, the model highlights the most relevant tests in each phase. The third contribution identifies the tests and instruments that are most and least frequently used, thereby encouraging their broader adoption. The study revealed that the application of case tests is a recent trend, with over 80% of instrument records dating from 2013 to 2024, suggesting significant potential for wider future application. The fourth contribution maps the instruments used by leading journals and correlates them with journal quartiles and thematic areas, thereby identifying key subareas within Business, Management and Accounting, as well as the impact of journals that use case tests. Finally, the inductive identification of instruments and their allocation within a new conceptual model allows us to introduce the concept of “Integral Validity.” This concept is relevant to two types of qualitative case studies: For exploratory or descriptive studies, “Full Validity” is achieved by using one or more instruments that ensure content validity, construct validity, external validity and reliability throughout all six research stages. In the case of explanatory studies, “Integral Validity” is applicable when instruments are used for all case tests – specifically, content validity, construct validity, external validity and reliability – across all stages of research. The “Full Validity” proposition serves as a “seal,” signifying the thorough application of these instruments and tests in the field.
Practical and managerial implications
This study has practical and managerial implications for editors-in-chief and journal associates. Because high-impact journals (Q1 and Q2) show a high concentration of test applications, editors of Q3 and Q4 journals must promote the use and development of test instruments. Such initiatives can enhance the quality of case study research. Empirical data indicate that the impact of articles is associated with the utilization of high-quality instruments and tests, suggesting that increasing their use could improve the visibility of both research and journals. This recommendation is particularly relevant for journals in the fields of Strategy and Management, Business and International Management and Management of Technology and Innovation, which frequently use qualitative case study research.
Limitations
This study had several limitations. One limitation pertains to the search terms and inclusion criteria, as only articles that explicitly acknowledged the use of validity or reliability instruments were included, potentially excluding other relevant articles. Another limitation involves the inductive classification of instruments and the stages at which they can be applied; this classification may be subject to interpretive bias or may be categorized differently if conducted by other researchers. To address this, the authors triangulated their findings and confirmed the categorizations (Dohale et al., 2022; Hawke and Heffernan, 2006; Johansen, 2008). Despite thorough research and triangulation, some instruments may have been overlooked in this review. The classification of the three journals not indexed in Scopus was interpretative and may contain errors. In addition, the term “content validity” was not included in the research protocol but was identified through a thorough reading of the full text of the articles. The authors believe that it is unlikely that any study would rely solely on a content validity instrument without associating it with other tests or case studies (Della Corte et al., 2013; Vlachos, 2023). We recognize the importance and variety of case studies from different perspectives and epistemological or methodological paradigms that are not addressed in this review. In particular, we did not analyze case studies based on realism, constructivism or critical approaches, which often use terms such as credibility, confirmability, dependability and transferability to describe trustworthiness of qualitative research. Regardless of the epistemological perspective or paradigm, we agree with Eisenhardt et al. (2016) on the significance of the method, especially its inductive dimension, in addressing “grand challenges.” We recognize the importance of maintaining methodological rigor while avoiding what Eisenhardt et al. refer to as “rigor mortis.” They emphasize that “the key is balancing essential information with a parsimonious use of journal space and reader time” (Eisenhardt et al., 2016, p. 1120). Similar to these authors and others, this study seeks to pursue genuine rigor and quality in qualitative research based on case studies rather than superficial rigor (Eisenhardt et al., 2016; Gioia et al., 2013).
Suggestions for future research: proposing a future research agenda.
Future research could address the limitations mentioned earlier and explore new opportunities in this area. For example, it could investigate the quality tactics used in qualitative case studies within nonpositivist paradigms and compare them with those identified in this study. What strategies do researchers associated with realism, constructivism or critical studies use to enhance the credibility, confirmability, dependability and transferability of their work? In addition, what similarities exist between these strategies and those developed by positivist researchers? Moreover, methodological studies could propose additional instruments or actions to ensure validity and reliability, particularly in the Case Selection, Research Protocol and Case Report stages, as these areas currently have fewer resources than the Data Analysis, Data Collection and Research Design stages. One important initiative for improving the Research Protocol and Case Report stages involves adopting open science practices, such as making interview transcripts and categorization codes accessible via permanent links for external validation. O’Kane et al. (2021) highlighted the necessity of increased transparency in qualitative research and initiatives related to open science can support this goal.
References
Further reading
Supplementary material
The supplementary material for this article can be found online.


