Artificial intelligence (AI) has the potential to dramatically change the human approaches to work, and specifically to learning and development. While AI coaching can reduce costs and increase accessibility, it also presents both opportunities and threats to human coaches. The objective of this study was to conduct a systematic literature review of peer-reviewed research on the use of AI in coaching.
A systematic literature review (SLR) method was used to search eight databases for articles produced up to March 2024. Data extraction was conducted, with Quality Assessment undertaken independently, in parallel, using two researchers and a third arbiter. The ROBINS-I tool was used to assess the risk of bias in the included studies. A narrative synthesis of a total of 16 quantitative, qualitative or mixed-method studies covering n = 2312.
The SLR identified four key themes: Research design and AI integration, AI usefulness in coaching, impact of AI coaching and ethical considerations. The findings suggest that AI coaches can be effective, accepted, useful and match human coaches in competence for specific tasks.
AI coaching is a growing area of practice and research. This paper brings together the literature and identifies future research priorities and potential next steps in AI coach development.
The paper uses clinical research SLR methods applying these robust processes to the field of organisational research, to set a new standard through the use of a pre-determined research protocol, quality assessment and ROB, well providing a comprehensive literature review of AI coaching.
Introduction
Artificial intelligence (AI), defined as machines that can mimic human capabilities such as problem-solving, pattern recognition and decision-making (Russell and Norvig, 2021), has been forecast to put 300 million jobs at risk over the next decade (Briggs and Kodnani, 2023). Simultaneously, AI is projected to create many new jobs and is currently dramatically affecting various sectors, from healthcare, finance, education, manufacturing, retail, to the creative industries (Anantrasirichai and Bull, 2022; Dwivedi et al., 2023; Lee and Yoon, 2021).
One application of AI is large language models (LLMs), which can understand, generate and interact with human language based on vast amounts of data (Brynjolfsson and McAfee, 2014). The creation of the transformer architecture (Vaswani et al., 2017) accelerated the development of LLMs such as Google’s BERT, OpenAI’s GPT-3 and GPT-4, Meta’s LLaMA and Microsoft’s Co-pilot (Giattino et al., 2023). Notably, completion LLMs like GPT-3 and BERT differ from chat-based LLMs like GPT-4 and LLaMA in their training and development. This paper focuses on the latter, emphasising conversational LLMs that have evolved significantly with the help of reinforcement learning for aligning models, giving rise to more interactive AI systems.
LLMs can be highly effective in tasks such as text summarisation, coding, language translation and generating creative content (Liu et al., 2023; Dell'Acqua et al., 2023). In the context of coaching, LLMs can provide real-time feedback and assist with administrative tasks, thereby potentially enhancing the overall coaching experience while allowing coaches to focus more on complex, human-centric aspects of their work (Passmore and Tee, 2023).
LLMs are undoubtedly going to affect professional services, including the coaching industry. In this article, coaching is defined as a voluntary intervention involving a series of future-focused, structured, purposeful conversations characterised by open questions, listening, summaries, reflections and affirmations, intended to facilitate the client in generating and acting upon strategies which result in developing greater self-awareness, enhancing personal responsibility and achieving meaningful progress towards a desired change. Coaching is used across various contexts, including sports, education, healthcare and business (Athanasopoulou and Dopson, 2018; Smith and Smoll, 2017; van Nieuwerburgh, 2016; Wolever et al., 2013). The application of coaching in work settings has accelerated; in 2022, 89% of organisational buyers of coaching estimated an increased investment in coaching in the upcoming year (McKenzie et al., 2022). Within the empirical literature, meta-analyses indicate that coaching has a medium effect on various desirable outcomes, such as performance, goal attainment, subjective well-being, self-efficacy, hope, resilience and optimism (Cannon-Bowers et al., 2023; De Haan and Nilsson, 2023; Jones et al., 2016; Theeboom et al., 2014).
The uptake of LLMs in coaching will have major implications for coach training schools, coaching clients and coaches, as it will in therapy (Hatch et al., 2025). From the coaching client side, LLM applications in coaching have the potential to make coaching available and affordable for more people (Terblanche et al., 2022b). Traditional coaching requires two human beings to meet in person or through digital means, which requires energy, time and cost for coaching clients. By levering LLMs in coaching through AI text or voice chatbots, which enables users to engage in conversational exchanges (Chung and Park, 2019), more people can gain inexpensive 24/7 access to AI coaching, defined as a “Synchronous coaching experience, where the machine replaces the role of the human coach, facilitating their human client in goal setting, issue exploration, personal reflection and developing insights and actions” (Passmore and Tee, 2024).
From the coaches’ perspective, scholars have previously suggested that an AI coach is capable of replacing a beginner coach, who relies heavily on existing coaching models and techniques, while being unable to replace an experienced coach who has moved beyond a simplistic model, such as GROW or other equivalent frameworks (Graßmann and Schermuly, 2021; Terblanche et al., 2022b). This is unsurprising given that AI bots were script-based until November 2022. However, the arrival of ChatGPT has reignited progress and early generative AI bots still show problematic behaviour (Passmore and Tee), but progress is being made (Hatch et al., 2025). However, has the potential to both radically change people development at work and create an existential threat to human coaches and therapists in the medium term. In the short term, AI can also be seen to complement human coaches by automating repetitive tasks such as writing notes from a session, writing up client’s agreed actions or intersessional activities, as well as undertaking coach-client matching, analysing coach performance during a session, advising coaching clients on which intersessional activities may best support their development, monitoring progress and providing nudges to clients (Passmore and Tee, 2023). These complimentary applications have the potential to make coaches, and coaching providers more effective. Given these potential uses, AI is also likely to impact coaching training schools, as upcoming coaches need to better understand how these tools work, and how they can design or leverage them to enhance their own practice (Passmore and Woodward, 2023).
Existing studies on AI in coaching are nascent but are expected to develop rapidly within the next decade. This systematic literature review aims to provide a platform for future research, focusing on a review of peer review published papers, as well as inform the development of coachbots which are starting to emerge from multiple coaching providers.
In formal terms, the primary objective of this study is to conduct a systematic literature review of peer-reviewed research relating to the application of AI technologies in coaching. The secondary objective of the study was to determine whether there is evidence for the effectiveness of AI as a coaching tool to generate desired and intended outcomes. Consequently, the research questions that guided this systematic review (SR) were:
How has AI been integrated into coaching in the empirical literature?
What methods and research designs have been used to study AI in coaching?
What does the empirical evidence suggest on the usefulness and impact of AI in coaching?
What ethical considerations do studies emphasise?
How can future studies and product developers advance the understanding and integration of AI in coaching?
By systematically reviewing the empirical literature on AI in coaching guided by these questions, we inform researchers and practitioners of the evidence that supports or does not support the effectiveness of certain AI applications when integrated into coaching. Consequently, the implications of the SR provide insight into evidence of the effectiveness of AI in coaching, which can direct product developers and future studies within this emerging field with evidence-based applications of AI in coaching.
Method
This SR was conducted in accordance with the “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” (PRISMA) guidelines (Moher et al., 2009). This included a SR protocol being created, submitted for inclusion in online protocol databases and circulated amongst the researchers to guide the data gathering and analysis stages.
Eligibility criteria
The type of studies for consideration in the SR were based on the following criteria: (1) Produced in the medium of English; (2) published in peer-reviewed publications; (3) the intervention was labelled as coaching; (4) the coach was not required to have technical knowledge of client’s role or desired outcome; (5) the term “AI” or “Artificial intelligence” was included in the title or abstract text; (6) the research produced primary data or was a systematic review/meta-analysis and (7) the study was published between 1st January 1990 (an arbitrary date assumed to precede all AI coaching research) and 1st March 2024.
Various techniques and approaches are used within coaching, typically rooted in psychological approaches. This SR did not restrict itself to any specific approach or domain. Therefore, the coaching could have been conducted by self-coaching, another human or by technology, and could have been a one-with-one or one-with-many intervention.
Search strategy and information source
The search term coach* + artificial intelligence was used to identify peer-reviewed qualitative, quantitative and mixed methods studies in the following databases: Business Source Complete, Emerald Journals, ProQuest, PubMed, PsycArticles, PsycINFO, Science Direct and Web of Science. A forward and backward search of the identified studies was subsequently conducted. In addition, the researchers checked specialist coaching publication websites for recently accepted but unpublished articles. Finally, coaching researchers with published studies featuring coaching using AI were contacted to ensure that any “in press” studies could be included in this SR.
Selection process
Once duplications from the searches across different databases were removed, the search strategy produced an initial set of 60 peer-reviewed articles. The titles and abstracts for each of these articles were independently screened against the eligibility criteria by two reviewers using reference manager software. Full texts were obtained for the 42 remaining relevant articles, which were again independently screened by two reviewers. At both screening stages, any disagreement regarding eligibility was resolved through escalation to the third researcher. As detailed in Figure 1, following the screening of the full texts, a backward/forward search then identified three further articles.
The flow diagram is titled “Identification of studies via databases and registers”. It is organized vertically into three stages labeled “Identification”, “Screening”, and “Included”, shown along the left side. Rectangular boxes connected by arrows describe the selection process from initial records to final included studies. In the Identification stage, the first text box reads “Records identified from: Databases (n equals 120). Registers (n equals 0)”. An arrow from this text box points right to a second text box labeled “Records removed before screening. Duplicate records removed (n equals 59). Records marked as ineligible by automation tools (n equals 0). Records removed for other reasons (n equals 0)”. A downward arrow from the first text box leads to the next stage. In the Screening stage, the third text box reads “Records screened (n equals 60)”. An arrow from this text box points right to a fourth text box labeled “Records excluded (based on title and abstract) (n equals 17)”. A downward arrow from the third text box leads to the fifth text box labeled “Reports sought for retrieval (n equals 43)”. An arrow from this text box points right to a sixth text box labeled “Reports not retrieved (n equals 1)”. A downward arrow leads to the seventh text box labeled “Reports assessed for eligibility (n equals 42)”. An arrow from this text box points right to an eighth text box labeled “Reports excluded: 27. ‘Coach’ was instructing (n equals 21). Article contained no primary data (n equals 6)”. In the Included stage, a downward arrow from the seventh text box leads to the final text box labeled “Studies included in review (n equals 16)”.PRISMA flow diagram of the screening process
The flow diagram is titled “Identification of studies via databases and registers”. It is organized vertically into three stages labeled “Identification”, “Screening”, and “Included”, shown along the left side. Rectangular boxes connected by arrows describe the selection process from initial records to final included studies. In the Identification stage, the first text box reads “Records identified from: Databases (n equals 120). Registers (n equals 0)”. An arrow from this text box points right to a second text box labeled “Records removed before screening. Duplicate records removed (n equals 59). Records marked as ineligible by automation tools (n equals 0). Records removed for other reasons (n equals 0)”. A downward arrow from the first text box leads to the next stage. In the Screening stage, the third text box reads “Records screened (n equals 60)”. An arrow from this text box points right to a fourth text box labeled “Records excluded (based on title and abstract) (n equals 17)”. A downward arrow from the third text box leads to the fifth text box labeled “Reports sought for retrieval (n equals 43)”. An arrow from this text box points right to a sixth text box labeled “Reports not retrieved (n equals 1)”. A downward arrow leads to the seventh text box labeled “Reports assessed for eligibility (n equals 42)”. An arrow from this text box points right to an eighth text box labeled “Reports excluded: 27. ‘Coach’ was instructing (n equals 21). Article contained no primary data (n equals 6)”. In the Included stage, a downward arrow from the seventh text box leads to the final text box labeled “Studies included in review (n equals 16)”.PRISMA flow diagram of the screening process
Quality assessment and risk of bias
The “Quality Assessment Tool for Studies with Diverse Designs” (QATSDD) (Sirriyeh et al., 2012) was used to assess the quality of the included studies. In both instances, two researchers conducted independent parallel assessments of the included studies. Any disagreements were resolved by discussion or escalated to the third researcher, which also ensured appropriate judgement had been reached by the other researchers.
The “Risk of Bias in Non-randomised Studies - of Interventions” (ROBINS-I) tool (Sterne et al., 2016) was selected for assessing any risks of bias, as it is specifically designed to include non-randomised studies. This was undertaken by one researcher, with a 20% sample undertaken by a second researcher to ensure consistency.
Results
The full-text review resulted in 16 articles being identified that met the eligibility criteria. There were seven quantitative, three qualitative and five mixed-method studies published between 2019 and 2024.
Following the PRISMA process (Page et al., 2021) a total of 27 articles were excluded. Six articles were excluded because they were not primary research or synthesis studies (predominantly being “thought piece” articles) and 21 were excluded because the “coaching” intervention was more akin to one-to-one instruction or training, requiring the “coach” to have specific technical knowledge. The forward/backward search identified 433 articles that were excluded, mainly because the intervention in these referenced studies did not involve AI or meet our definition of coaching. This high rejection rate points to the nascent nature of this field of research, with researchers as yet unable to draw on a large body of extant research and instead needing to “cast their net” over tangential fields of inquiry. A summary of each included study is provided in Table 1.
Summary of papers
| Authors (doi) | Paper type (Qual, RCT, Quasi-Experimental design) | Sample characteristics | n (number of participants) | AI coaching delivery | AI coaching training | Results/Findings/Insight |
|---|---|---|---|---|---|---|
| Ellis-Brush (2021), https://doi.org/10.24384/er2p-4857 | Mixed-methods: Quasi-experimental design and qualitative | Banking sector No age reported | 48 | Text-based | Cognitive behavioural therapy | Improved self-resilience, No significant Working alliance |
| Figueroa et al. (2021), https://doi.org/10.3389/fdgth.2021.747153 | Qualitative | Low-income women, aged 27–41, majority Hispanic/Latine | 18 | Text-based | Behavioural activation, motivational interviewing, acceptance and commitment, and solution-focused therapy | Positive perception of the chatbot, showed interest in using chatbots for health improvements, concerns about data privacy |
| Hassoon et al. (2021), https://doi.org/10.1038/s41746-021-00539-9 | Randomised control Trial (RCT) | Overweight or obese cancer survivors, mean age 62.1 years, 90% female, various cancer types | 42 | Text-based Voice-based | Physical activity interventions | Improved step count by voice bot compared to text bot and control, no significant difference between text bot and control |
| Kannampallil et al. (2022), https://doi.org/10.2196/38092 | Mixed methods (Observational): Quantitative and qualitative | Patients with mild to moderate depression and/or anxiety, mean age 43.9 years, 77% female, 73% racial or ethnic minorities | 26 | Voice-based | Problem-Solving treatment | High pragmatic usability and favourable user experience, higher temporal workload during a problem-solving session |
| Mai et al. (2021), https://doi.org/10.1007/978-3-030-90328-2_29 | Mixed-methods: Quasi-experimental design and qualitative | Students, aged 21–39, majority male (75%) | 12 | Text-based | Exam anxiety, solution-focused coaching | Disclosure to the chatbot and rapport, more self-disclosure and rapport found in the chatbot informational disclosure versus self-disclosure |
| Mai et al. (2022), https://doi.org/10.1038/s41746-021-00539-9 | Exploratory quantitative study | University students | 21 | Conversational AI (writing), Rule-based (clicking) | Exam anxiety, solution-focused coaching | Moderate to high working alliance, higher value observed for bonding in the conversational AI, coaching through chatbots well accepted |
| Movsumova et al. (2020), https://doi.org/10.15219/em86.1485 | Mixed-methods (qualitative and quantitative) | Varied demographics, including men and women of different ages, occupations, and positions | 33 | AI-based tool (Mentorbot) through Telegram, assisting human coaches in dialogue and session analysis | No specific training details for coaches on Mentorbot mentioned | Positive dynamics clarity, willingness to act and stress reduction. Mentorbot was effective for novel and confidential requests, while human coaches were stronger in reducing stress and perceived overall usefulness |
| Passmore et al. (2021), https://doi.org/10.53841/bpstcp.2021.17.2.41 | Quantitative: Survey | Coaches from 79 countries, average age 54, 66% female | 1200 | N/A | N/A | Mixed views of the role of AI in coaching, equally divided seen as providing benefits and disbenefits |
| Passmore and Tee (2023), https://doi.org/10.1108/JWgAM-06-2023-0057 | Cross-sectional, mixed-method study | Experts in coaching, academic program directors, experience in reviewing and marking coaching assignments | 14 | Text-based | Various prompts to evaluate GPT-4’s ability to define coaching, compare ethical codes, summarise meta-analyses, and conduct coaching | GPT-4 is capable of generating plausible content but often contains inaccuracies and falsified information Concerns over CPT-4 ethical judgement were highlighted |
| Stephens et al. (2019), https://doi.org/10.1093/tbm/ibz043 | Feasibility study | Youth enrolled in a weight management program, mean age 15.2 years, 57% female, 43% Hispanic | 23 | Text-based | Behavioural coaching | AI coach was feasible and helpful; high engagement (4,123 messages), 96% found it useful, 81% reported positive progress toward goals |
| Terblanche and Cilliers (2020), https://doi.org/10.22316/poc/05.1.06 | Exploratory study: Survey | Online participants, no age or demographics reported | 226 | Text-based | Goal-attainment theory, GROW model | Performance expectancy, social influence, and attitude significantly influence behavioural intent to use AI coach |
| Terblanche et al. (2022a), https://doi.org/10.24384/5cgf-ab69 | Longitudinal RCT | Undergraduate students, diverse demographics, average age 22 years | 168 | Text-based | Goal-attainment theory, GROW model | Improved goal attainment, no significant changes in psychological well-being, resilience, or perceived stress |
| Terblanche et al. (2022b), https://doi.org/10.1371/journal.pone.0270255 | Longitudinal RCT | Business school students, diverse demographics | 478 | Text-based | Goal-attainment theory, GROW model | Improved goal attainment compared to the control group, same effect on goal attainment as human coaches |
| Terblanche et al. (2023), https://doi.org/10.1080/17521882.2022.2094278 | Qualitative study | Final year undergraduate students, aged 20–22, diverse cultures, low socioeconomic background | 31 | Text-based | Goal-setting | Positive attitude and performance expectations promoted engagement; AI coach perceived as accessible, easy to use, and intelligent; minimal perceived risk; social influence and information about the AI coach influenced adoption |
| Terblanche et al. (2024), https://doi.org/10.1080/17521882.2024.2304792 | Qualitative study | Coaches and clients from a financial services organisation | 16 (9 coaches, 7 clients) | Text-based | GROW model | Coaches were concerned about potential negative interference with the coach-client bond, while clients found the chatbot useful for goal tracking, accountability, and convenience. Clients felt psychologically safe with the chatbot and appreciated its non-judgmental nature |
| Tropeg et al. (2019), https://doi.org/10.2196/12805 | Scoping review | Mostly within medical care | Not applicable (Review of 49 studies) | Text, voice and avatars, ECA | Various methods, focusing on health improvement | Effectively support physical activity and weight management. ECA have significant potential in promoting healthy behaviours |
| Authors (doi) | Paper type (Qual, RCT, Quasi-Experimental design) | Sample characteristics | n (number of participants) | AI coaching delivery | AI coaching training | Results/Findings/Insight |
|---|---|---|---|---|---|---|
| Mixed-methods: Quasi-experimental design and qualitative | Banking sector | 48 | Text-based | Cognitive behavioural therapy | Improved self-resilience, No significant | |
| Qualitative | Low-income women, aged 27–41, majority Hispanic/Latine | 18 | Text-based | Behavioural activation, motivational interviewing, acceptance and commitment, and solution-focused therapy | Positive perception of the chatbot, showed interest in using chatbots for health improvements, concerns about data privacy | |
| Randomised control Trial (RCT) | Overweight or obese cancer survivors, mean age 62.1 years, 90% female, various cancer types | 42 | Text-based | Physical activity interventions | Improved step count by voice bot compared to text bot and control, no significant difference between text bot and control | |
| Mixed methods (Observational): Quantitative and qualitative | Patients with mild to moderate depression and/or anxiety, mean age 43.9 years, 77% female, 73% racial or ethnic minorities | 26 | Voice-based | Problem-Solving treatment | High pragmatic usability and favourable user experience, higher temporal workload during a problem-solving session | |
| Mixed-methods: Quasi-experimental design and qualitative | Students, aged 21–39, majority male (75%) | 12 | Text-based | Exam anxiety, solution-focused coaching | Disclosure to the chatbot and rapport, more self-disclosure and rapport found in the chatbot informational disclosure versus self-disclosure | |
| Exploratory quantitative study | University students | 21 | Conversational AI (writing), Rule-based (clicking) | Exam anxiety, solution-focused coaching | Moderate to high working alliance, higher value observed for bonding in the conversational AI, coaching through chatbots well accepted | |
| Mixed-methods (qualitative and quantitative) | Varied demographics, including men and women of different ages, occupations, and positions | 33 | AI-based tool (Mentorbot) through Telegram, assisting human coaches in dialogue and session analysis | No specific training details for coaches on Mentorbot mentioned | Positive dynamics clarity, willingness to act and stress reduction. Mentorbot was effective for novel and confidential requests, while human coaches were stronger in reducing stress and perceived overall usefulness | |
| Quantitative: Survey | Coaches from 79 countries, average age 54, 66% female | 1200 | N/A | N/A | Mixed views of the role of AI in coaching, equally divided seen as providing benefits and disbenefits | |
| Cross-sectional, mixed-method study | Experts in coaching, academic program directors, experience in reviewing and marking coaching assignments | 14 | Text-based | Various prompts to evaluate GPT-4’s ability to define coaching, compare ethical codes, summarise meta-analyses, and conduct coaching | GPT-4 is capable of generating plausible content but often contains inaccuracies and falsified information | |
| Feasibility study | Youth enrolled in a weight management program, mean age 15.2 years, 57% female, 43% Hispanic | 23 | Text-based | Behavioural coaching | AI coach was feasible and helpful; high engagement (4,123 messages), 96% found it useful, 81% reported positive progress toward goals | |
| Exploratory study: Survey | Online participants, no age or demographics reported | 226 | Text-based | Goal-attainment theory, GROW model | Performance expectancy, social influence, and attitude significantly influence behavioural intent to use AI coach | |
| Longitudinal RCT | Undergraduate students, diverse demographics, average age 22 years | 168 | Text-based | Goal-attainment theory, GROW model | Improved goal attainment, no significant changes in psychological well-being, resilience, or perceived stress | |
| Longitudinal RCT | Business school students, diverse demographics | 478 | Text-based | Goal-attainment theory, GROW model | Improved goal attainment compared to the control group, same effect on goal attainment as human coaches | |
| Qualitative study | Final year undergraduate students, aged 20–22, diverse cultures, low socioeconomic background | 31 | Text-based | Goal-setting | Positive attitude and performance expectations promoted engagement; AI coach perceived as accessible, easy to use, and intelligent; minimal perceived risk; social influence and information about the AI coach influenced adoption | |
| Qualitative study | Coaches and clients from a financial services organisation | 16 (9 coaches, 7 clients) | Text-based | GROW model | Coaches were concerned about potential negative interference with the coach-client bond, while clients found the chatbot useful for goal tracking, accountability, and convenience. Clients felt psychologically safe with the chatbot and appreciated its non-judgmental nature | |
| Scoping review | Mostly within medical care | Not applicable (Review of 49 studies) | Text, voice and avatars, ECA | Various methods, focusing on health improvement | Effectively support physical activity and weight management. ECA have significant potential in promoting healthy behaviours |
Source(s): Authors’ own
Risk of bias and quality results
Of the 16 studies, 11 were rated as having a “low” overall risk of bias, with 2 studies awarded a “high” categorisation and 2 rated as “unclear”, due to key information not being reported in the source studies. Using the 16 items in the QATSDD tool, an average quality rating of 77% was determined, with an inter-rater reliability (IRR) of 0.67. The most prevalent indicator of quality was a clear rationale for the choice of data collection tool, which was detected in all 16 studies. The lowest scoring quality determinant was “Statistical assessment of reliability and validity of measurement tool(s)”, which was absent from six of the quantitative data studies.
Research design and AI integration
The summary table reveals a heterogeneity in the design characteristics across the included studies. Sample sizes ranged from 4 to 1200, multiple dependent variables have been studied and research settings include low-income Hispanic women, banking sector employees and overweight/obese cancer survivors. An important realisation was the usage of AI “AI coaching” as an umbrella construct. AI technology has advanced markedly over the five-year period that the included studies were published, meaning the term “AI” is used across a wide range of studies from those which use a “Wizard of Oz” (with a remote researcher creating faux “AI” responses in real time that appear on the participant’s screen), to text-based models and voice-based avatar interactions. Further AI coaches have been trained on diverse approaches, ranging from cognitive behavioural therapy, goal theory, motivational interviewing and solution-focused coaching.
AI usefulness in coaching
Our SR suggests that various methodologies in AI coaching are useful for and accepted by people, ranging from voice, text, script- and conversational-based (Figueroa et al., 2021; Mai et al., 2021, Kannampallil et al., 2022; Stephens et al., 2019). AI coaches can provide feedback, help with goal-tracking, serve as accountability partners and are seen by participants in the studies published to date as accessible, convenient and psychologically safe (Terblanche et al., 2023, 2024). Furthermore, AI coaches can create a non-judgmental environment that encourages self-disclosure (Ellis-Brush, 2021; Mai et al., 2021; Terblanche et al., 2024).
In addition, an AI-based bot, assisting human coaches with questions to ask during a coaching session and creating a report to reflect after a coaching session, has been shown to broaden coaches’ perspectives (Movsumova et al., 2020). This hybrid approach to coaching was more effective for novel and confidential requests than human coaching without AI bot assistance. However, while both showed a positive influence on clarity, willingness to act and stress reduction, human coaches were stronger in reducing stress and perceived overall usefulness.
The impact of AI coaching
The four studies that utilised either randomised control trials or quasi-experimental research methodologies suggest that AI coaching chatbots designed with a narrow sense of purpose aimed at improving specific outcomes are effective in increasing beneficial outcomes. Terblanche et al. (2022a, b) showed that Vici, who was solely trained in goal attainment, was able to match the level of impact of human coaches on goal attainment while being ineffective in increasing resilience and psychological well-being. Ellis-Brush (2021) showed that an AI coaching chatbot trained in psychological techniques and resources via cognitive behavioural therapy increased self-resilience. Hassoon et al. (2021) showed that an AI coach trained in delivering physical interventions was able to increase the step count of cancer survivors compared to a control group.
In terms of working alliance, Mai et al. (2022) found that both script-based (clicking) and conversational (writing) AI coaches demonstrated a medium to a high working alliance, supported by other studies in the review (see Kannampallil et al., 2022; Terblanche et al., 2024). However, Ellis-Brush (2021) found that an AI coach was able to increase self-resilience without demonstrating a working alliance. Rather than focusing on working alliances, Terblanche and Cilliers (2020) argued that AI coaches should rather have technology acceptance. Their findings suggest that performance expectations, positive attitude and social influence predict behavioural intentions of using an AI coach. These findings were supported in a later study, where positive attitude and performance expectations promoted engagement while social influence affected adoption (Terblanche et al., 2023).
Ethical considerations
Surprisingly, few of the papers included in this review discuss ethical concerns or implications arising from participants. However, a few of them discussed and reported ethical principles and risks including data privacy, security, lack of harm and bias.
Figueroa et al. (2021) reported that most participants had privacy concerns relating to sharing their location and had limited technological understanding; Ellis-Brush (2021) noted that biases in AI can result in greater social prejudice and that no ethical framework exists to guide the applications of digital agents. Passmore and Tee (2023) in their review of GPT-4 expressed concerns over its ability to make ethical judgments and to identify risk during conversations. While GPT-4 communicated empathy, it did not provide referrals to crisis hotlines, emergency services or relevant professional services. Terblanche et al. (2023) reported that the perceived risk of using an AI coach was low, as it was seen by students as being unbiased, impartial and private to use. Terblanche (2020) noted “best practices” as suggestions for designing an AI coach, including ethical considerations which include trust, transparency, reliability, benevolence and integrity.
Discussion
We identified 16 papers from the review that met our inclusion criteria for the period from the first of January 2000 to the first of March 2024. As can be noted from the summary table (Table 1), the first papers emerged in 2019, with the volume of papers building during 2021 and 2022. We expect that process to continue given the developments of AI and, in part, this was a motivator for us to provide a platform for future research.
Our SR suggests that people are willing to use a range of methodologies in AI coaching and benefit from doing so (Mai et al., 2021; Stephens et al., 2019; Terblanche et al., 2022a, b). Combining these insights with the prediction that AI coaching methodologies and features will improve and become more efficient, AI has a huge potential in learning and development and specifically within coaching.
One such potential future use case is the development of embodied conversational agents (ECAs) using generative AI with multimodal features (i.e. visual, audio, text, which have become the most popular interfaces in healthcare and psychology (Tropea et al., 2019). Other features of ECAs include gender, aesthetics, voice, personality and behavioural patterns (Tropea et al., 2019). Thus, ECAs might even represent celebrities, political figures or intellectuals in the future, allowing users to receive AI coaching from Marshall Goldsmith, Albert Einstein, or Donald Trump.
Independent of AI coaching methodologies and features, the papers in this SR indicate that AI coaches that are designed with a narrow sense of purpose aimed at improving specific outcomes are effective in enhancing desirable outcomes such as goal attainment, self-resilience and physical activity (Ellis-Brush, 2021, Hassoon et al., 2021; Terblanche et al., 2022a, b). This is echoed by evaluations of coachbots against the ICF coach competencies, which have shown AI coaches are capable of meeting ICF credentialing standards at ACC and show evidence of PCC behaviours (Passmore, 2024).
These findings might suggest that AI coaching is a genuine threat to novice coaches who rely heavily on simple models and approaches in their coaching practice (Graßmann and Schermuly, 2021; Terblanche et al., 2022b). However, caution is needed regarding the reliability and validity of the findings of the above studies. There are three reasons for this. Firstly, samples have generally been based on student samples. Secondly, they have more typically been based on simple, often health related goals. Finally, there are questions which can be raised about individual studies, which are often cited as key evidence. One example is that Terblanche et al.’s (2022a, b) studies were conducted among undergraduate students and the AI data were gathered during the COVID-19 lockdown, which may have influenced participants’ reactions given they were unable to leave home. More research is needed to validate these results. Furthermore, the rigorousness of the Ellis-Brush (2021) study was questionable, as it featured a within-effect design, did not report the validity and reliability of the questionnaire on self-resilience and had a limited sample size. Similarly, although using randomised control trial design, Hassoon et al. (2021) had a limited sample size of cancer patients.
That said, these insights have some important implications for human coaches and coaching training schools. First, coaches should equip their students with knowledge about AI and support them in gaining the skills to design and deploy AI coachbots. This may involve collaboration with programmers to leverage retrieval augmented generation (RAG) architecture to enhance the quality of the coaching bot performance. Although only two studies in the review covered client benefits associated with AI-assisted methodologies in complementing coaching praxis (Terblanche et al., 2024), coaches should be open to exploring how AI systems can enhance the quality and efficiency of their coaching through reflective practice, client goal-tracking and accountability, automation of repetitive tasks (coaching session reports), inter-sessional activities, assessments and client discovery before the human coaching session (Passmore and Tee, 2023).
Second, although AI’s potential is enormous, currently coaches have unique human features that AI does not have. For instance, while AI can demonstrate empathy through data-driven responses, arguably better than human professionals (Ayers et al., 2023), a machine does not feel anything. Empathy is an emotional response that implies that you feel what the other person is feeling (Cuff et al., 2016). Thus, “AI empathy” might be better viewed as cognitive empathy (understanding) compared to affective empathy (emotional response), and thus may create a different effect in the human participant (client).
Other unique human coaching features include adaptability, cultural sensitivity, ethical judgment and privacy. Human coaches can adapt to the needs and change of direction in the coaching conversation while relying on various approaches, tools and techniques learned from coaching, which will be harder for an AI coach. Current AI coaches also lack the “meta-intervention” of using different coaching approaches and techniques based on the culture in which the coaching occurs or reflecting the cultural background of the client. Furthermore, clients sometimes bring challenging ethical dilemmas to explore in the coaching conversation, which coaches might be better to facilitate compared to an AI chatbot. In addition, humans can draw on intuition, which AI will struggle to replicate. Lastly, while AI bots may be lower priced than human coaches, many people will prefer a human coach that shares their identity and lived experience and to whom they feel accountable.
In terms of the client–AI coach relationship, the review suggests that an AI coach can demonstrate a working alliance, but it might not be as important to outcomes in AI coaching as it is in human coaching (Mai et al., 2021; Terblanche and Cilliers, 2020). Although we are unable to make a strong conclusion based on the scarce studies in the review, technology acceptance might be a contender to working alliances when it comes to the effectiveness of AI coaching. Expecting that an AI coach will perform effectively and having a positive attitude towards AI coaches and other people in one’s social environment using it does seem to be particularly important when it comes to behaviour intentions, engagement and adoption (Terblanche and Cilliers, 2020; Terblanche et al., 2023). Future studies could help discern the role of working alliances and technology acceptance in moderating the relationship between AI coaching and desirable client outcomes.
Related to human–AI relationships, ethical considerations are needed to prevent undesirable outcomes and unintended consequences. Although not emphasised in the majority of studies, our review suggests there are several ethical concerns when it comes to AI coaching, including data privacy and security, lack of harm and biases (Ellis-Brush, 2021; Figueroa et al., 2021; Passmore and Tee, 2023). Given these concerns from researchers and participants alike and the multiple ethical questions AI raises, we believe there is a role for professional bodies in managing and regulating the development and applications of AI coaching. Both the ICF and EMCC have published ethical standards relating to AI and digital coaching (see EMCC, 2024; ICF, 2024). These standards, developed with AI developers and coaching platform providers, offer advice and guidance but have avoided setting compliance standards such as those set for human coaches through competency frameworks or ethical codes of conduct. A possible next step is for such frameworks to become quality assurance standards, using frameworks such as ISO, providing information and comfort to organisational buyers and to individual users on data privacy, data storage, bias and ethical practice. Further changes may also result from legislation such as The EU's (2024) AI Act and changes to GDPR regulations. Such standards could bring AI bots into line with human coaches, reducing the risk of harm from AI coaches while promoting a focus on human flourishing and holding AI developers accountable for the unintended consequences of their AI coaches (Passmore et al., 2024).
Future research
Given the expected rise in research on AI coaching, there are many fruitful insights that future research could explore. Due to the lack of strong empirical research in the field, the main focus of future research should be to test the effectiveness of AI coaching by using a robust research design including randomised control trials (RCT) and quasi-experimental methods with different participant groups and sufficient participant numbers to ensure enough power. In addition, comparing the effects of human coaches with different levels of expertise (beginner versus experienced coach or managers) to AI coaching on outcomes including goal attainment, self-insight, solution-focused thinking, subjective well-being and psychological well-being will be invaluable. Such studies will reveal greater clarity on what circumstances and with what clients AI coaching can match or outperform human coaching.
Next, future research should compare different modalities of the delivery of AI coaching, including voice, text and ECAs. Only one study in the review compared voice and text-driven AI bots (Hassoon et al., 2021). While the voice bot was shown to be more effective in increasing physical activity compared with the text bot, the mechanism of interactions between the user and the AI coaches was different. Therefore, future research should test text versus voice bots with different user groups, while ensuring that the interaction between the bot and the user is identical (e.g. bidirectional or initiated mainly by the AI coach or the participant).
In terms of ECAs, future research will need to discern which features of ECAs are most productive for coaching and which ones lead to unintended consequences, such as dependency, harm and the “uncanny valley” phenomenon, which refers to negative impressions of machines with overly human-like qualities and negatively correlates with the trustworthiness of ECAs (Passmore et al., 2024; Sasaki et al., 2017; Tropea et al., 2019). This would include comparing the effectiveness of animated-, human-, machine-like (robot) and animal ECAs to important coaching outcomes.
We hypothesise that the closer to the human experience, and the closer to known or respected individuals the AI bot appears, the better the outcomes that the AI coaching tool will be able to achieve as users project human qualities onto the experience and feel more accountable for their commitments.
Given the lack of ethical considerations in the coaching studies in this review, qualitative and quantitative studies with prospective users and AI designers on ethical principles in AI coaching will be important. From the user’s perspective, this means assessing what ethical features of the AI coach increase trust, engagement, adoption and behavioural intentions. This could be done with interviews alongside the carefully designed quantitative assessments on ethical principles that users deem valuable. On the AI design front, conducting a similar quantitative study, interviews or a document review analysis on the ethics of existing AI coaching systems according to ethical principles could provide important information on current ethical status and applications (Passmore et al., 2024).
Future product development
Given the rapid advancements and immense potential of AI in coaching, the scope for future product development is vast. While numerous features could enhance the coaching experience, several key developments stand out for their potential to benefit prospective users.
Multimodal interaction: Having various multimodal interaction features in AI coaching that users can pick and choose from may offer enhanced benefits. These multimodal features include text, voice and video, and a range of types of ECAs. Users should be able to pick and choose the interaction method that best meets their needs. While most people are expected to talk to and look at the AI coach, others may prefer to interact through writing. In addition, adding various features of ECAs may add value, such as different animated figures, animals or different human images, which offer both known personas and also diversity across race, gender and nationality. These ECAs should also have different behavioural characteristics, personalities, genders, colours and voices associated with them. In each case, the designers should make clear what the user is interacting with. Equally, interaction styles can be varied. For instance, while some people may prefer a high level of challenge, others may prefer a more empathetic and compassionate interaction. This gives the user autonomy to pick and choose the features that are most compatible with them, and the issue they wish to explore. In addition, having multimodal features is more likely to meet the needs of neurodiverse individuals and provides greater opportunities to offer different experiences to meet diverse cultural and national backgrounds.
Elements of effective human coaching: AI coaches should incorporate the essential elements of coach–client relationships such as empathy, non-judgmental attitude, trust, transparency, benevolence and integrity (Terblanche and Cilliers, 2020). By offering cartoons or animal representations, AI coaches could prevent the impression that bots are human (Sasaki et al., 2017; Terblanche, 2020). For those that use a human image, user information should make clear the user is engaging with a bot not a human. Further consistent with a coaching style, AI coaches should ask one question at a time and should be coded to avoid giving advice.
Specificity: Until we have general AI that can perform any intellectual task that human beings can do, AI coaches should be designed with a narrow sense of purpose aimed at achieving specific outcomes. Just like human coaches who normally specialise in people development, as opposed to also offering accounting services and legal advice, these AI coaches should focus on evidenced-based approaches, such as solution-focused coaching, cognitive behavioural coaching or positive psychology coaching. By relying on theoretical models supported by research, AI coaches can provide greater confidence to users that positive outcomes can be achieved (Olafsson et al., 2025).
Advanced Personalisation: In addition to the features of the ECAs discussed above, features that increase personalisation are encouraged. These features should leverage the demographics, cultural background and context of the user. This could be incorporated with simple questions that users can fill out before engaging with the AI coach, and then the AI coach has more data to adjust their interaction based on the client’s background and context. Furthermore, having the ability to attach documents, such as resumes, journal entries, pictures and videos, while engaging with the AI coach can enhance personalisation. In addition, the integration of wearable technology, which gives data on things such as sleep, heart rate variability and exercise habits, is valuable since it gives AI data-driven information on the physiology and lifestyle habits of the user, allowing it to personalise the questions and feedback it gives with greater effectiveness. The balance needed here is, that while increased personalisation is beneficial for the user, the data should not be used to exploit nor manipulate the users with marketing, create dependency or lead to discriminatory outcomes.
Ethical principles: Covering all AI coach ethical principles is too vast a topic to articulate here (see Passmore and Tee, 2023; Passmore et al., 2024). That said, transparency with informed consent is encouraged, which explicitly states simply the non-humanness of the AI coach, acknowledges imperfection and biases, and has crystal clear information on data privacy, security and use. This allows prospective users to be aware of mindful of the implications and risks associated with interacting with the AI coach. Furthermore, AI coaches should only work within their domain of training and expertise, meaning that if the user brings up topics that are not an AI coach’s role to facilitate exploring, such as suicidal ideation or irrelevant out-of-scope questions, the AI should make appropriate referrals in the first case and direct the conversation back to relevance in the second case.
In addition, future AI coaching tools must address potential biases, which can arise from the data used to train these systems, leading to unfair or inappropriate outputs. Implementing robust bias detection and mitigation strategies is essential to ensure equitable and inclusive interactions across a global user population (Ferrer et al., 2021; Leong and Sung, 2024). Addressing biases not only enhances the fairness of AI tools but also fosters trust and reliability among users. By proactively managing these issues, developers can create AI systems that align with ethical principles while supporting effective coaching outcomes for varied global audiences (Khan et al., 2022).
Although various fruitful features for product development could benefit coaches, they remain out of the scope of the findings of this systematic review.
Limitations to the study
While the study set high standards in developing a research objective, a research protocol and using dual evaluations throughout the process, along with an Extraction Template, Quality Assessment process and ROB, all unusual in organisational research but common in clinical studies, this study has several limitations that warrant consideration. First, the researchers established a protocol of the systematic literature review; it was not possible to publish the protocol on an Open Access site, as no such site exists for this type of research. Failing to publish the protocol in advance reduces transparency and increases the risk of design deviation.
Second, there was a change in personnel over the course of the study, as two researchers active at the planning stage of the research dropped out and were replaced by one substitute. Although this change occurred relatively early in the research process, it could have introduced some inconsistencies from the protocol to the data collection stage.
Third, while the average quality rating of the studies in the SR was 77%, the inter-rater reliability was 0.67, indicating some inconsistencies in how the researchers assessed the quality characteristics of the source studies, which could influence the reliability of the review findings. However, to mitigate this lack of reliability, a third researcher was involved in the arbitration process, in which he independently assessed the quality of the studies and resolved any discrepancies observed by the initial researchers. Including the third researcher increased the likelihood that the final quality assessment was more reliable, thereby enhancing the credibility of the systematic review.
Conclusions
In conclusion, by 2035, the coaching industry, like many others, will look radically differently as a result of AI. Our review indicates that various versions of AI coaches can be useful, accepted and effective, and increasingly will match human levels of competence. Researchers should focus on continuing to assess how AI coaching can contribute to human development and wellbeing, through robust research methods. Product developers should emphasise designing effective and ethical AI coaching tools which serve the needs of their users.
Declaration of interest statement: The authors declare that no funding was received for this research and that they have no financial interests or personal relationships that may have influenced the writing of this paper.
Data availability statement: Additional data relating to this paper is available on request from the authors.
References
References marked by *were included in the systematic references review.
