This study is threefold. The authors define and provide a typology for generative AI-enhanced stimuli (GES); map how GES have been used in organizational behavior (OB) experiments; and outline a research and practice agenda for integrating GES into OB designs.
The authors used the scoping review methodology and followed conventional best practices for reporting, via Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR). The authors searched peer-reviewed articles published between 2014 and 2024 through Scopus, PsycINFO, Web of Science, Business Source Complete and Google Scholar.
This study identifies critical trends to assist scholars in using GES as part of their experimental research designs. This then provides an overview of the different theoretical, practical and methodological considerations linked to using GES. In addition, it identifies key outcomes affected by GES use and provide key avenues for future research, including theoretical, practical and methodological considerations.
This study provides new and timely insights regarding GES in OB research. This scoping review guides scholarship by considering theoretical, methodological and ethical factors.
Introduction
Over the past decade, human subject experimental research has continued to evolve. There has been surging scholarly interest in artificial intelligence (AI). AI is an umbrella term spanning various technology and algorithms used to overcome human-specific creative, intellectual and computational limitations (Banh & Strobel, 2023; Dwivedi et al., 2021) by allowing capabilities such as recognizing patterns, aiding with decision-making, understanding natural language and learning over time (Castelvecchi, 2016). Scholars already note its potential to reshape society and the relationships among its citizens (Loureiro, Guerreiro, & Tussyadiah, 2021), given anticipated paradigm shifts across various business-related disciplines including marketing, psychology and healthcare (Benoit, 2023; Zamudio, Grigsby, & Michelsen, 2024). We specifically examine a subset of AI, generative AI (GenAI), given its unique capabilities to generate or create content (Andrieux, Johnson, Sarabadani, & Van Slyke, 2024) while displaying anthropomorphic features (Li & Suh, 2022; Munnukka, Talvitie-Lamberg, & Maity, 2022). These features, we argue, enable research to achieve new levels of realism for experimental designs.
Realism is increasingly emphasized in studies incorporating GenAI, in particular, in the marketing field (see van Berlo, Campbell, & Voorveld, 2024; Zamudio et al., 2024). By contrast, organizational behavior (OB) has paid relatively little attention to methods for enhancing realism in experiments. New trends and calls to explore this topic in experimental research are emerging, with a focus on technology to engage participants (Keeler, Brock Baskin, & McKee, 2025a; Leavitt, Qiu, & Shapiro, 2021; Philip, 2022). For example, consider vignettes, which commonly use a written scenario, details a hypothetical situation or entails a scripted video to replicate workplace events (see Aguinis & Bradley, 2014). While historically proven to be effective in experimental research, modern-day technology via GenAI may provide valuable options for a more immersive experience due to adaptive capabilities between human and computer interaction. Related to this scholarly discussion, recent research has evidenced that chatbots and visual media enhance participant engagement, realism and persuasion (Keeler, Brock Baskin, & McKee, 2025b, Supplementary Analyses; see also Wang & Peng, 2023). And yet, because the technology is still in early stages, many questions remain as to how it impacts OB research contexts (Bankins, Ocampo, Marrone, Restubog, & Woo, 2024; Chen, Liu, Guo, & Zhang, 2024).
We sought to examine the prevalence of AI technology in OB research, focusing on experiments. While there have been calls within OB for methodological innovation (Leavitt et al., 2021), the field is fragmented in contributions that involve direct deployment of operational GenAI, much less as an enhanced stimuli in experimental settings. This fragmentation is limiting in our understanding of how OB scholarship can tap into these capabilities and benefit from increased ecological validity. Ecological validity is understood as whether generalization occurs beyond what was observed in a laboratory setting (Schmuckler, 2001). We scope GenAI use in OB experiments, addressing fragmentation in terminology and implementation. We focus on when and how GES may enhance realism and under what conditions they may (or may not) improve ecological validity. A scoping review is appropriate when: a research area is disjointed, thus necessitating a synthesis of scattered studies; new methodologies are emerging, thus requiring a mapping to formal theory development; and also, practical and methodological shortfalls need further investigation to guide future research (Arksey & O'Malley, 2005; Colquitt, Hill, & De Cremer, 2023; Munn et al., 2018).
This research makes several contributions. First, we provide a definition and typology for GenAI-enhanced stimuli (GES). Next, we provide a current overview of research taking place with GenAI relating to experimental research in OB. From this, we answer three research questions, informed from the preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR) method related to usage, ethics and participant outcome effects. Finally, our review of the literature highlights future research avenues to advance conversations revolving around GES uses in OB.
Generative AI-Enhanced stimuli: Definition and typology
We define GES as experimental materials created or augmented by GenAI and presented to participants. Beyond this definition, we also provide a GES typology: interactive GES (bi-directional, adaptive exchanges; e.g. chatbots/voice-bots) and noninteractive GES (AI-generated but fixed media; e.g. video/audio/images). Both GES types can be engineered to increase realism, though only interactive GES require participants to interact dynamically (e.g. in a back-and-forth fashion) with the GES. This process should be indicative of increased ecological validity as part of anexperimental manipulation. Building on work by Costello, Pennycook, & Rand (2024), Peng & Yang (2025) suggest experimental stimuli derived from GenAI can allow for reimagination of complex social dynamics that more accurately capture situations occurring in the real world (e.g. conversations). Furthermore, van Berlo et al. (2024) describe AI experimental stimuli to create an interactive and real-world experience for participants. Importantly, van Berlo et al. (2024) limit their description of increased ecological validity to the advertising context and do not tie their work to experimental designs widely used in behavioral research. We thus go a step beyond these initial efforts and pose that GES encompasses other types of stimuli within an experimental setting.
Research shows that AI can offer advice on innovative ideation (e.g. AI-assisted crowd voting), help reduce attention deficits, as well as minimize human bias in decision-making contexts (Freisinger, Unfried, & Schneider, 2024). Thus, AI used in a dynamic experimental setting can help enhance a user’s experience, engagement and decision-making, leading to consider its utility as a promising tool, with trade-offs that require careful design and reporting. We use the term GES because of the capacity of said technology to be characteristic of augmenting or enhancing stimuli used to elicit desired or theorized cognitions, emotions and/or behaviors from a study’s participants beyond traditional techniques. Put simply, GES used in said contexts is likely to extend the potential of LLMs (e.g. chatbots and voice-bots) by uniquely leveraging a dynamic two-party interaction element via the stimuli with a study’s participants. Other applications of GenAI may have a varying level of ecological validity such as visual (image and video creation) or audio (music and podcasts) because of being static rather than conversationally dynamic. This being so, we are not minimizing the value afforded when used in such ways, experimentally.
Our definition of GES goes beyond the subset of LLM usage. It concerns using GenAI in a fashion to the MADE framework outlined by van Berlo et al. (2024). While discussed from the lens of advertising, we submit their work could be generalized as a standard practice for scholars desiring an experimental GenAI deployment foundation with their eight steps. Relative to our defined term, it is not necessarily the single usage or invoking of an AI technology tool that suffices as something that is an enhanced stimulus. Our view is more holistic and concerns things such as purpose, context, manipulation to aid in GenAI tool type (conversation, visual, or audible) deployment for immersion via added realism and ecological validity. GES is based on a thoughtful design and implementation by the researcher with an intent of active engagement from the subject participants. This interpretation and understanding aligns with the Mapping and Assembly components by van Berlo et al. (2024). Thus, as a preliminary step, the researcher must have active involvement to partially (co-creation) or fully engage an AI tool to generate material. Second, we pose that a participant must actively engage with an AI stimulus. Such engagement can be done dynamically via conversational stimulus or cognitively while interacting with a visual or audio media that is GenAI developed. The central focus of GES is to increase realism and through it, garner ecological validity from experimental designs in ways that non-GenAI tools are unable to deliver. Per recent literature review work (Fahmie et al., 2023) ecological validity for applied behavior analysis studies should emphasize alignment with real-world phenomena, which we argue GenAI accomplishes in multifaceted ways. Third, GES can integrate elements of gamification (e.g. puzzles, simulations and role-play), provided GenAI was used to generate content. Fourth, we believe the GES term and typology can be applied into other technologies (e.g. robotics), especially when GenAI capabilities simulate human-like interaction and foster authentic, real-world engagement.
Method
This study uses a scoping review methodology. The selection of a scoping review methodology was appropriate for three main reasons, including the:
exploration of a novel topic;
availability concerns regarding how much literature could help address our research questions; and
lack of conceptualization clarity regarding the topic chosen (Davis et al., 2009).
After refining our research questions throughout the process of charting the data, we arrived at three research questions, allowing us to:
map how GES have been used in OB research;
compare GES effectiveness with traditional researcher-created stimuli; and
identify theoretical, methodological and ethical challenges inherent to using GES.
These three elements define our research questions. Given the nascent nature of our topic, coupled with its fragmented and interdisciplinary nature, a scoping review is well suited to provide scholars with both structure and clarity (Munn et al., 2018). This review follows the six-stage framework proposed by Arksey & O'Malley (2005) and later refined by Levac, Colquhoun, & O'brien (2010). Through this scoping review, a decision was made to include this sixth step due to its potential to include more articles in the final sample that end up adding to the relevance of the results. Recent work by Napier, Slemp, & Vella-Brodrick (2024) advanced methodological scholarship by providing scoping review authors with conventional best practices for reporting. This scoping review accordingly adopts these best reporting standards. This scoping review also adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist (Tricco et al., 2018). The six stages of the framework adopted for this scoping review are detailed below.
Stage 1: Identifying research questions
The research questions were developed iteratively through discussion among the research team and from refinement of the scoping review process in of itself, as a recommended process by Levac et al. (2010). They were based on a review of recent advancements in AI applications in experimental research and the need for methodological innovation in OB. Similar to Napier et al. (2024), three broad research questions with subcomponents emerged. This refinement process resulted in how GES are used in OB experiments and how they compare to traditional stimuli; theoretical, methodological and ethical challenges of GES; participant outcomes and boundary conditions.
Stage 2: Identifying relevant studies
A comprehensive literature search was designed to capture a wide range of sources that use, evaluate, or conceptualize the use of GES in OB research. The search strategy was developed among the authorship team, and then, with a research librarian, who reviewed the Boolean search strings used to ensure comprehensiveness and reproducibility. Searches were performed using Boolean search strings wherein our selected terms (keywords) were combined using “AND” “OR” and “NOT” as operators. These searches were conducted in the following databases: Scopus, PsycINFO (via EBSCOhost), Web of Science, Business Source Complete (via EBSCOhost) and Google Scholar. The procedures pertaining to article selection criteria (inclusion and exclusion), along with are detailed in Table 1.
Stage 3: Study selection
Articles were included if they met the following inclusion criteria:
Explicitly used or examined GES (e.g. chatbots, LLM-generated vignettes, virtual agents) in the context of OB research and/or related fields (e.g. management, I-O psychology).
Reported empirical findings, methodological frameworks and/or conceptual discussions relevant to experimental research design enhanced by GenAI.
Literature published between 2014 and 2024 in peer-reviewed academic journals.
Studies were excluded if they:
Focused solely on AI adoption or AI as a research subject, rather than a research tool or method.
Did not include an experimental design or did not use at least one GenAI-enhanced stimulus within an experimental framework.
Were editorials, opinion pieces or outside the OB disciplinary scope.
Stage 4: Data charting
After assessing each article for eligibility based on the inclusion/exclusion criteria used by the authors and summarized in Table 1, a structured data extraction template was developed and piloted by the research team using several criteria. This process was done through reaching consensus among the authorship team and included basic information about the final articles composing our sample. After two authors independently charted the data, discrepancies were resolved through further discussion and template refinement until consensus was reached.
Stage 5: Collating, summarizing and reporting the results
Following item 24 of the PRISMA-ScR checklist listed in Tricco et al. (2018), we used qualitative thematic analysis, allowing us to group studies based on themes relevant to the research questions, after retaining the articles composing our final sample. We organized findings into three main domains:
(a) Type(s) and (b) comparison of GES to traditional experimental designs not relying on AI-enhanced stimuli regarding effectiveness and validity.
(a) Theoretical, (b) methodological and (c) ethical considerations.
(a) Participant outcomes and (b) conditional effects of GES methods.
The results are reported following PRISMA-ScR guidelines, including a flow diagram of the screening and selection process (Figure 1), and a table summarizing the decisions pertaining to inclusion/exclusion (Table 2). Specific information allowing us to answer our research questions was extracted from the retained articles and composed our themes, constituting our final results, discussed later in this article.
Stage 6: Consultation (optional)
After consulting among the authorship team, the articles that were in the field of Marketing were ultimately retained, considering a pattern emerged as authors charted the data. Specifically, while not in the OB field, these articles informed on GES and relied on an experimental design, making them relevant for the purpose of this scoping review.
AI use statement
During the preparation of this work, the authors responsibly used Generative AI in accordance with the AI Policy at Emerald Publishing. For transparency, an editor of OMJ was contacted and approval was requested based on our intent to use such tool. Based on our intent, there were no alarming concerns. The tool used was ChatGPT Pro (o1) and limited solely to the following purpose: boolean query refinement for improved search engine assistance. After using this tool, the authors thoroughly reviewed and edited all content and take full responsibility for the publication.
Results
Our initial search results yielded 694 articles across all databases used in our searches. These databases were Web of Science (91 results), Scopus (229 results), APA PsycINFO via EBSCOhost (108 results), Business Source Complete via EBSCOhost (62 results) and Google Scholar (204 results). After full screening and elimination of duplicate articles, our final sample was composed of 11 articles that met our inclusion criteria, detailed in the next sections.
Characteristics of studies examined
After reaching Stage 3, 56 articles were retained for further screening to fully assess their eligibility based on inclusion/exclusion criteria. As the authorship screened these articles for retention in the final sample, it became apparent that the adoption of GES in OB and adjacent or distinct disciplines is slow. The majority of the articles being ultimately excluded were published in fields that were not closely related to OB (n = 38). Out of the 56 articles screened, 7 were initially retained. After authors met, a decision was made to proceed to step stage 6 (optional consultation). This resulted in retaining 4 additional articles that met all our eligibility criteria except for the field (these articles were in the field of marketing).
Of the studies screened for inclusion/exclusion, the vast majority were published between 2020 and 2024, indicating the recency and rapid development of this methodological domain. Most studies were situated within the fields of OB (e.g. Jiang, Wang, Li, & Liu, 2022; Trifilo & Blau, 2024), management (Freisinger et al., 2024) and human–computer interaction (e.g. Granulo, Caprioli, Fuchs, & Puntoni, 2024; Grundke, 2024), although several came from adjacent domains such as robotics (You & Robert, 2023) and computer science (Othlinghaus-Wulhorst & Hoppe, 2020). Notably, many of the included studies used vignettes or simulation-based tasks, incorporating AI content in the form of chatbots, algorithmic decision-makers, embodied robots, or AI-generated outputs, such as resumes or interview questions. We found that 33 out of 56 articles made a reference to AI (via keywords, abstract, within manuscript) usage but did not actively use it (see Table 2). Further, out of these 33 articles, the majority of studies reviewed did not indicate pertinent GES characteristics to support added realism leading to increased ecological validity. Rather, many scholars stated that they relied on vignettes with inconsistent term description usages of an AI component (n = 20).
In terms of methodological design, many articles used randomized controlled trials (e.g. Langer, König, & Busch, 2021; Suen, Hung, & Lin, 2020; Terblanche, 2024), vignette experiments (e.g. Gonzalez et al., 2022; Ossadnik, Muehlfeld, & Goerke, 2023) and online scenario simulations (e.g. Grundke, 2024; Klein, Depping, Wohlfahrt, & Fassbender, 2024). A smaller number of studies used physical prototypes or embodied interactions in virtual or hybrid settings (e.g. Richert, Müller, Schröder, & Jeschke, 2018; Robb et al., 2023). Although most research was experimental, some hybrid methods emerged, such as combining experimental interventions with multi-wave surveys (e.g. He et al., 2023; Liao, Lin, & Chen, 2023) or leveraging biometric sensors and AI-assisted video analysis in field-like experiments (e.g. Jindo et al., 2020).
Across the corpus, AI stimuli varied not only in form but in function. Some studies used AI to simulate decision agents (e.g. Fahnenstich, Rieger, & Roesler, 2024; Schweitzer & De Cremer, 2024), others embedded AI in training simulations (e.g. Othlinghaus-Wulhorst & Hoppe, 2020) and a few involved real-time AI-driven behavior recognition or feedback (e.g. Manitsaris, Senteri, Makrygiannis, & Glushkova, 2020; Suen et al., 2020). Overall, this diversity in experimental stimulus highlights the methodological flexibility and interdisciplinary influence that shape OB research in the era of AI. Table 2 provides a comprehensive overview of the 11 articles composing our final sample. According to the recommendations of Arksey & O’Malley (2005), other articles reviewed as part of the scoping review should be used when possible and feasible, contrasting this method with systematic reviews, that only focus on articles retained. As such, we also considered insights and themes from 49 additional articles that underwent a cross-check verification process. Within our thematic synthesis support of the research questions, it will be noted citations come from both the main inclusion data and preliminary inclusion:
(a) How have GenAI-enhanced stimuli been used in OB research, and/or (b) how do they compare to traditional researcher-created stimuli in terms of effectiveness and validity?
Below, we discuss how GES have been used in OB research or how they compare to traditional researcher-created stimuli. These themes provide an informative foundation to understand potential pathways and attributes to consider when incorporating AI tools to simulate realism and workplace complexity.
Selection and Hiring. For instance, studies by Gonzalez et al. (2022), Grundke (2024) used vignettes that explicitly described AI and human decision-makers in hiring contexts, demonstrating that AI-based conditions can elicit distinct perceptions of fairness and acceptability. Suen et al. (2020) implemented an AI-based video interview system to predict communication skills and personality traits in job applicants. The system outperformed traditional self-report (human-coded) assessments in both predictive accuracy and scalability. Finally, research by Trifilo & Blau (2024) used an experiment that involved GenAI animated avatars that representing racially ambiguous job candidates to simulate video interviews. Hiring professionals found the avatars plausible and promising. As such, their research was proposed to advance to full field experiments and improving avatar realism to explore intersectional identity cues. Gonzalez et al. (2022) found that job candidates responded more negatively to AI-only selection systems compared to hybrid or human-based systems, indicating that the medium of the stimulus (AI vs human) itself can shape outcome perceptions.
Anthropomorphic characteristics.Terblanche (2024) directly compared generative AI chatbots (GenBot) with rule-based chatbots (ScriptBot), revealing that the former was more effective in enhancing perceptions of autonomy, competence and relatedness. Studies using intelligent embodied agents, such as humanoid robots or smart speakers, have reported improved realism and trust in experimental tasks (Robb et al., 2023; Richert et al., 2018). These agents enabled researchers to investigate social dynamics, including trustworthiness, empathy and credibility, in human-AI collaboration, which are challenging to capture through text-based methods (Freisinger et al., 2024). Compared to traditional stimuli, such as written vignettes or hypothetical role plays, AI-enhanced methods appear to produce more immersive, interactive and contextually relevant experimental experiences.
Task collaboration. Research by Hayashi (2023) used chatbots as an intervention to aid group members in task collaboration. Their research found newcomers are more effective than insiders at changing group member perspectives when solving problems. Using GenAI in this experimental capacity allows for increased utility to study group social dynamics in ways that may otherwise be unfeasible. Work by Jia, Luo, Fang, & Liao (2024) concluded firms should implement tailored training and adapt job designs based on their experimental usage of GenAI-based chatbots. In these instances, an emerging theme is GenAI effective and further assessed with validity:
(a) What theoretical, methodological, and ethical challenges are associated with GenAI-enhanced stimuli in OB research, and/or (b) how do these challenges shape future research?
Addressing the second research question, several theoretical, methodological, and ethical challenges emerged from our findings. These themes are discussed below.
Theories used. Conceptual frameworks used to interpret AI-enhanced experiments are often borrowed from adjacent fields, such as the Uncanny Valley (Liao et al., 2023), psychological ownership theory (Liu, Yuan, & Jiang, 2025), or socio-technical systems theory (Liang, Gou, Wang, & Dabić, 2024), indicating a need for OB specific theorization.
Design-specific deception. Worth noting, several articles relied on a Wizard-of-Oz approach (Bartosiak & Modlinski, 2022; Robb et al., 2023) which involves deceiving participants into thinking that they interact with an AI tool or a robot when they, in fact, interact with a researcher who pretends to be a robot. So, when considering GES implementation, related deception can sometimes be reduced, though disclosure, consent, and expectation management remain essential. In other words, experimentalists do not need to ask participants to report whether they believed that they were indeed interacting with a tool or robot, hereby potentially limiting the occurrence of participant attrition.
Implementation barriers. Real-time generative AI responses (e.g. in chatbot interactions) can create inconsistencies across participants, as noted in Terblanche (2024) and Wang & Peng (2023), thus scholars should carefully think through their research design to mitigate such risks. In addition, multimodal stimuli, such as avatars or embodied agents, often require complex programming and hardware integration (Richert et al., 2018; Robb et al., 2023), posing barriers to replication, as scholars need appropriate knowledge and skills to design similar experiments. While frameworks such as Wang & Peng (2023) and Zamudio et al. (2024) propose systematic approaches for AI stimulus development, empirical studies still vary in rigor, particularly with manipulation checks and validation. In terms of methodological rigor, several studies pre-registered their experiments and conducted power analyses (e.g. Granulo et al., 2024; Yam, Tang, Jackson, Su, & Gray, 2023), thereby enhancing the internal validity of studies using GES.
Ethical challenges. Ethical concerns emerged regarding transparency, and psychological risk. Participants’ responses to AI systems often reflected perceptions of fairness, and threat to human dignity (Granulo et al., 2024; Zhang & Amos, 2024). Studies found that algorithmic management reduced perceptions of employee dignity Zhang & Amos (2024) and creativity Schweitzer & De Cremer (2024), highlighting the potential for dehumanizing effects when AI is used to simulate management. Moreover, some studies have offered interventions such as justification mechanisms (Zhang & Amos, 2024) or leader behavior cues (He et al., 2023) to mitigate ethical concerns; however, broader ethical guidelines for AI stimuli in behavioral research remain underdeveloped:
(a) What participant outcomes (e.g. engagement, realism, ethical perceptions, decision-making accuracy) are most affected by GenAI-enhanced stimuli, and/or (b) under what conditions are these effects most pronounced?
The findings revealed that key outcomes emerged when GES were used. In particular, the review revealed insights pertaining to AI use disclosure, employee creativity and key participant attitudes and behaviors for non-GES in the context of managerial instructions.
Disclosure distrust.Arango, Singaraju, & Niininen (2023) used images generated by AI in the charitable giving context and showed that when GenAI use was disclosed to potential donors (awareness of falsity), a backfiring effect resulted such that intentions to donate were lower. In such instance, the emergency context is suggestive to moderate in a capacity to affect intentions to donate. Scholars should keep in mind that revealing GES is deployed to a participant may play a role in altering hypothesized outcomes. This is likely due to varied participant thoughts and perceptions related to AI usage being used on them.
Employee creativity. Mixed evidence was found for studies investigating employee creativity. Schweitzer & De Cremer (2024) found that teams managed by algorithms were perceived to be less creative and were allocated less resources dedicated to innovation relative to teams led by humans. Jia et al. (2024), however, showed evidence of higher employee creativity, leading to higher performance when employees were assisted through AI. This phenomenon was present, in particular, when employees were highly skilled relative to their low-skill peers, who also experienced more negative emotions at work. These findings may highlight potentially adverse consequences of mass AI adoption for organizations and human resource managers (HRMs), who might, inadvertently increase the skill gaps between low-skilled and high-skilled employees. Yet, remedies may exist through training and development (e.g. Maity, 2019) along with alignment of organizational needs and communications of required skills and knowledge in updated job descriptions and job specifications.
Employee feedback.Tong, Jia, Luo, & Fang (2021) showed that AI can assist in providing more accurate, relevant, and consistent feedback to employees. Although these scholars also found that disclosing to employees that AI was used to provide feedback to them backfired, hereby leading to a “value-destroying disclosure effect” suggesting the need to communicate proactively to employees about why AI applications would be used to alleviate employees’ concerns (Tong et al., 2021).
Behavioral mechanisms.Lanz, Briker, & Gerpott (2024) captured the participants’ intentions to adhere to unethical instructions when the manager was human versus non-human, using non-GES (vignettes). These scholars found that participants’ intentions to adhere to those unethical instructions were higher when the manager was human. Through a qualitative inquiry, these scholars also asked participants to justify their decision to adhere to instructions and explain the role their supervisor played in this decision. Participants reported that that when instructions were provided by a human, they feared retaliation, perceived a real mind, attributed higher prejudicial motivation and anticipated future outcome interdependence (Lanz et al., 2024). To explore these mechanisms further, Lanz et al. (2024) performed a follow-up study manipulating the level of perceived mind (high/low) using non-GES, specifically, AI-generated voice messages versus real voice messages from supervisors. Their results replicated those of their initial study using non-GES, with higher intentions to adhere to instruction under the high mind condition (using t-tests). While these scholars did not rely on GES, future work could expand on their work and establish more robust findings through constructive replications (Köhler & Cortina, 2021, 2023) using GES, thereby allowing researchers to test stronger causal arguments among study variables.
Discussion
This scoping review sought to accomplish four goals. Given the paucity of research investigating GenAI uses in experimental research in OB, we establish a consistent and comprehensive definition of GES, along with a typology to differentiate interactive from non-interactive GES. Considering the potential for GenAI to add realism as part of experimental designs, leading to ecological validity, an investigation of the different types of GES was necessary. We also identify theoretical, ethical and methodological considerations linked to using GES. Our findings reveal a lack of consistent guidelines or recommendations linked to using GES in OB. Our last research question sought to understand how GES most strongly affects participant outcomes and specific conditions under which such effect occurs. We propose that simply invoking the name or term of AI in experimental OB research in a passive manner does not capitalize on the engagement capabilities of the technology compared to active usage of GenAI output. Our definition and typology are appropriate for using in array of experimental stimuli commonly used in behavioral research. GES should be used to closely mirror a real-world scenario. The unique capacity for GenAI to create or generate content should add realism to research designs (Leavitt et al., 2021) and ultimately allow for added ecological validity (Leavitt et al., 2021; van Berlo et al., 2024; Zamudio et al., 2024).
Several important takeaways can be highlighted from this work. First, GES can enhance realism, increase participant engagement and affect study outcomes. This is highlighted when the research question pertains to human-machine interaction, automation, or trust in the technology. As a caveat, it is important to note research is limited and still emerging in GenAI-based experimental human subject research. Furthermore, despite the methodological challenges identified, using GES can add realism beyond the capabilities of non-GES, thus leading to increased ecological validity. This being so, as a precaution of being human-like, the technology can present challenges and concerns for controlled experimental research, which could pose a limitation. This is because GenAI derived content, may at times, introduce spuriousness and confounding results based on the dynamic content it provides. For this reason, standardization and documentation is necessary. We found that AI elicits ethical concerns, in particular, because of fairness or threat to human dignity (Granulo et al., 2024; Zhang & Amos, 2024), and technology-related deception (Arango et al., 2023). Despite the benefits of using GES, the nature of AI in content creation for video, images, music, text, via LLMs has an element of randomness, which requires scrutiny in the experimental design and deployment phase. Otherwise, challenges could be raised for several forms of replication – literal, quasi-random, confounded and/or constructive (see Köhler & Cortina, 2021 for details). To mitigate such risks, scholars could rely on standardization and documentation (see Keeler et al., 2025a; Keeler et al., 2025b). In these studies, scholars used AI to derive video reels for subject condition consistency in an experiment.
Theoretical implications
From a theoretical standpoint, conversations revolving around role theory (Biddle, 1986; DeRue & Ashford, 2010) and functional interdependence theory (FIT) (Balliet, Tybur, & Van Lange, 2017) are likely to inform future research using GenAI-enhanced stimuli. From a structural-functional role theory perspective (e.g. Vandenberghe, Bentein, & Panaccio, 2017), scholars could investigate whether individuals behave differently when presented with GES as a result of changing role-occupant-specific behavioral expectations. This is rather important considering several of the articles reviewed in this work mentioned that AI was used as a team member (e.g. Hayashi, 2023) or an aide to decision-making (Freisinger et al., 2024; Ling, Dong, & Cai, 2025). Scholars may use socio cognitive theory (Bandura, 1986) and social information processing theory (Salancik & Pfeffer, 1978) to answer research questions pertaining to whether study participants’ individual differences and environments when exposed to GES may lead to the emergence of key relationships, respectively. Through a socio cognitive lens, scholars could investigate whether individuals with low or high self-efficacy display approach or avoidance motivation behaviors (Elliot, 2013; Elliot & Thrash, 2002) towards GES, in line with recent empirical results from (Jia et al., 2024). Alternatively, social information processing theory may help us understand whether and how social cues shape study participants’ attitudes towards GES. Organizational justice scholars have also highlighted how AI influences workplaces, and specifically, how related technology adoption affects broad management and human resource management (HRM) initiatives across organizations (Bennett & Martin, 2025; Colquitt et al., 2023).
Organizational justice scholars could, for example, rely on longitudinal designs to evaluate the long-term consequences of GenAI adoption at varying stages of the organizational life cycle. Another line of inquiry may reside in teams’ contexts. Specifically, Hayashi (2023) noted that we lack clarity when assessing changes in perspectives in team contexts, when bias and conflictive states occur. Another promising area of investigation for scholars may reside in advancing situational strength theory (Meyer, Dalal, & Hermida, 2010). In their work, Meyer and colleagues define situational strength as “implicit or explicit cues provided by external entities regarding the desirability of potential behaviors” (p. 122). Future work could examine whether such cues differ when individuals are exposed to GES versus non-GES. If true, this could imply that individuals would choose to engage or avoid specific behaviors due to cues-specific psychological pressure, which could ultimately affect behavioral variance and subsequent trait-to-outcome relationships (Meyer et al., 2010).
Methodological considerations
In light of recent research calling for a more thorough consideration of ecological validity in behavioral research (Fahmie et al., 2023), this scoping review is the first attempt at conceptualizing GES and providing a typology of GES, paving the way for future methodological considerations in experimental designs. GES offers a unique opportunity for experimentalists to study real-world phenomena through providing their study participants with more realism (e.g. Leavitt et al., 2021). From a methodological standpoint, our work provides several considerations for scholars who might consider implementing GES as part of their experimental research design. In addition, scholars should keep in mind the potential technical challenges associated with use of such tools for research purposes, especially when using chatbots that are used in a way to elicit desired responses from participants. Methodological ambiguity such as conflating textual AI cues with interactive AI agents also highlights the value of standardized protocols like the MADE framework, which promotes ethical disclosure, manipulation checks and stimulus realism (Wang & Peng, 2023).
Beyond these issues, the predominance of one-time, culturally homogenous samples (e.g. Lanz et al., 2024; Terblanche, 2024; Wang & Peng, 2023) calls for longitudinal and cross-cultural studies to better understand evolving perceptions of AI agency and fairness (Robb et al., 2023; You & Robert, 2023). Scholars often rely on convenience samples accessible through services provided by online panel platforms (OPP; Porter, Outlaw, Gale, & Cho, 2019) and argue that such services allow accounting for more demographic diversity within samples (e.g. Aguinis et al., 2021). For example, reliance upon crowdsourcing services such as these offered by Prolific rely on OECD countries (Prolific, 2025), as highlighted by Terblanche (2024). Despite current research highlighting diverse samples available in online panel data (OPD) samples, in particular, for the MTurk platform (Behrend, Sharek, Meade, & Wiebe, 2011), MTurk workers (paid survey takers) reside primarily in the U.S. or India (Ipeirotis, 2010), and answer questions written in English (Keith & Harms, 2016). Further, according to Keith & Harms (2016), many scholars set survey participant selection criteria aimed at targeting U.S.-based workers. This situation potentially neglects diversity stemming from culture-specific differences among participants, associated with other populations living in other parts of the world (e.g. Asian or African countries). Thus, current research limits generalization efforts to populations living outside of Western, educated, industrialized, rich and democratic (WEIRD) nations (Henrich, Heine, & Norenzayan, 2010).
A third point in our definition and typology of GES made note of gamification opportunities and applications. We submit a novel usage of GES in said fashion could be enhancing the traditional anagram technique (see Gino & Pierce, 2009; Schweitzer, Ordóñez, & Douma, 2004). Work in the OB area concerning unethical behavior (Bonner, Greenbaum, & Quade, 2017; Greenbaum, Hill, Mawritz, & Quade, 2017) and also in organizational citizenship behavior (Mawritz et al., 2024) have used anagrams for experimental stimuli purposes. This being so, considering the ethical, interactional and behavior themes from our work, future research would benefit by developing GES-based anagrams. We suggest this to be a methodological advancement with the dynamic capabilities afforded by GES to engage participants. Usages may entail experimental priming, novelty for attention check administration, evaluation of cognitive processing, or individual difference segmentation.
Practical considerations
From a practical standpoint, this scoping review highlights key trends and considerations for managers and human resource (HR) practitioners who rely on AI to support decision-making (Bartosiak & Modlinski, 2022; Gonzalez et al., 2022; Granulo et al., 2024). Where this applies, managers should plan on further AI adoption and anticipate needs to remedy skill-related gaps to address potential performance issues. Such initiatives could encompass employee training (e.g. Maity, 2019). Freisinger et al. (2024) found that when participants were compensated, AI use aversion could be mitigated, especially through a compensation system over time. (Shao et al., 2025)Second, our findings show that unethical behavior emerges more frequently when employees were instructed to perform unethical behaviors by a human (Lanz et al., 2024). This raises questions about which tasks or responsibilities can be delegated to technology to potentially reduce such bad behaviors. Third, AI reliance was found to backfire when communication from management was not proactive in the context of providing employee feedback (Tong et al., 2021). Managers and HR practitioners should thus not simply focus on the performance-related benefits of GenAI but rather accompany change initiatives with timely and transparent communication initiatives to avoid employee deception when feedback is automated to avoid backfiring effects. To address ethical challenges linked to GES, interdisciplinary collaboration with AI developers and ethicists is critical for embedding transparency and participant dignity into research design (Lanz et al., 2024; Schweitzer & De Cremer, 2024; Zhang & Amos, 2024).
In the marketing literature, Arango et al. (2023) investigated context-specific limitations of such technology by looking at how GenAI use for image generation and its disclosure in the context of charity can backfire in nonemergency contexts with lower intentions to donate. Their work raises questions about ethical use of GenAI-enhanced stimuli in practice (e.g. consumer or donor deception). Going back to OB, these issues further highlight the need to carefully consider justice norms across the different types of justice and in various contexts when AI-related decisions are implemented. The consequences of increased adoption of AI-powered decision-making poses questions about how employees react to such managerial initiatives (Granulo et al., 2024). Ling et al. (2025) notably highlights the benefits of AI in resume screening (cost efficiency) when used as an aide to decision-making but simultaneously recognizes fairness-related concerns for applicants, subject to decisions. Employees should be given a chance to have input in processes and doing so may be difficult as adoption continues across organizations (Gonzalez et al., 2022). In sum, “employees remain just as focused on justice and fairness” (Colquitt et al., 2023p. 414) and this phenomonon, we argue, should continue as workplaces continue to embrace technology increasingly over time.
Limitations
Several limitations should be noted and acknowledged with this work. First, the range of literature was from 2014 through 2024. In a rapidly emerging research space, it is quite possible some of the concerns or benefits espoused have been further researched and examined. It was discussed amongst the team to include 2025, but at the time of writing, such an undertaking would likely lead to a perpetual dilemma of an appropriate literature cutoff. To some degree this could be a form of publication bias. Regardless, at a roughly three-month lag into 2025, the work presented in this review is still very timely. A second limitation is that there are most likely GES studies taking place, but published results have not been released to the public. Potentially this is due to private purposes or options for researchers to withhold their work from conference proceedings but still get peer reviewed feedback. Thirdly, the nature of this review only included “English” language publications. AI-based research from other countries that have a journal not transcribed to English are not included. This is not expected to be material, but it does remain a slight limitation in accumulating literature on the topic. Future research should consider these potential limitations to extend this work.
Conclusion
This review is an early initiative to explore a rapidly emerging conversation at the intersection space of generative artificial intelligence (GenAI) management and OB, specific to experimental designs. We offer a definition of GES, along with a typology that distinguishes interactive and non-interactive GES. We provide an overview of the types of GES and highlight the main theoretical, methodological and practical considerations linked to using GES. This preliminary effort is rather critical and should be viewed as a seminal foundation to build upon in as scholars navigate the rapidly evolving space of GenAI. In sum, we hope that this scoping review spearheads more efforts aimed at understanding the full scope of GES and how they will affect OB research. Critical questions remain unanswered so far regarding how humans might react to such designs, whether specific personality characteristics and/or environmental settings might shape these reactions and whether perceptions change over time as technology becomes increasingly the norm both in academia and across organizations.


