While the proliferation of chatbots allows companies to connect with their customers in a cost- and time-efficient manner, it is not deniable that they quite often fail expectations and may even pose negative impacts on user experience. The purpose of the study is to empirically explore the negative user experience with chatbots and understand how users respond to service failure caused by chatbots.
This study adopts a qualitative research method and conducts thematic analysis of 23 interview transcripts.
It identifies common areas where chatbots fail user expectations and cause service failure. These include their inability to comprehend and provide information, over-enquiry of personal or sensitive information, fake humanity, poor integration with human agents, and their inability to solve complicated user queries. Negative emotions such as anger, frustration, betrayal and passive defeat were experienced by participants when they interacted with chatbots. We also reveal four coping strategies users employ following a chatbots-induced failure: expressive support seeking, active coping, acceptance and withdrawal.
Our study extends our current understanding of human-chatbot interactions and provides significant managerial implications. It highlights the importance for organizations to re-consider the role of their chatbots in user interactions and balance the use of human and chatbots in the service context, particularly in customer service interactions that involve resolving complex issues or handling non-routinized tasks.
1. Introduction
Driven by the rapid advancements in technology, including machine learning and deep learning, artificial intelligence (AI) has garnered significant attention from information systems (IS) researchers, practitioners and the public. In 2023, Gartner finds that 55% of companies that have previously deployed AI start to expand the application of AI in new use cases (Gartner, 2023a). It further predicts that companies’ spendings on AI software will grow from 124 billion to 297.9 billion dollars by 2027 (Gartner, 2023b). AI technologies can be applied to a wide range of business domains, such as fraud detection, voice recognition and natural language processing. Among these domains, McKinsey (2021) highlights that more than 50% of AI use cases are associated with service operations. AI chatbots are good examples.
AI chatbots are increasingly deployed to automate workplace processes (Gkinko and Elbanna, 2022; Pillai et al., 2024) and augment user experience (Di Silvestro, 2018). The chatbot market is predicted to increase from 190.8 million dollars to 1.250 billion dollars from 2016 to 2025 (Statista, 2023). Chatbots have been widely integrated into companies’ websites and social media platforms as a tool for customer relationship management. They feature social network and AI-based technologies to personalize service offerings and to meet users’ expectations in a timely manner (Kohler et al., 2011). Some notable examples include voice-driven assistants (e.g. ALexa from Amazon, Siri from Apple, Cortana from Microsoft and Celia from Huawei) and text-based applications (e.g. Finn from Finnair, TOBi from Vodafone and Zara from the Zurich Insurance).
Unlike human customer service or self-service technology, AI chatbots are one type of virtual assistants that use advanced technologies, such as natural language processing (NLP) and machine learning, to converse with human users through texts or audio (Hoyer et al., 2020). Such technologies allow AI chatbots to dehumanize customer service processes and deliver services in a cost- and time-effective manner, e.g. answering user queries about products and services, directing user queries to relevant departments, or even helping resolve issues raised by users. Thus, AI chatbots are not only an innovative service method, but also a new type of service provider. They have gained their prevalence in the retailing and service context to replace or substitute frontline human employees in areas such as providing 24/7 customer services, connecting with potential users to answer pre-transaction queries, as well as engaging users in human-like conversations (Ramesh and Chawla, 2022; Roggeveen and Sethuraman, 2020).
Echoing the prevalence of AI chatbots in the service contexts, the existing literature has been devoted to understanding the design, adoption and positive user experience of chatbots (e.g. Mende et al., 2019; Qiu et al., 2020). Although AI chatbots have showcased the advantages of enhancing efficiency and offering functional benefits, their constrained interaction capabilities have emerged as a significant obstacle in delivering high-quality services. Issues, such as miscommunication, irrelevant responses, and poor integration with human agents, have been often raised (Elliott, 2018). One example is Facebook’s chatbot, Project M, which is reported to have a failure rate of 70% over all its interactions with users (Griffith and Simonite, 2018). This suggests that despite the technological advancement in the design of AI chatbots, they may often fall short of users’ expectations and cause miscommunication and even service failures.
AI chatbots are not always able to correctly interpret a user request and respond to it in a meaningful manner due to problematic scripts (Chong et al., 2021). Thus, it is not surprising that 64% of service providers are reluctant to implement AI chatbots due to customers’ reluctance to interact with chatbots (Srinivasan et al., 2018). Given the notorious failing of chatbots in delivering customer services and meeting customer expectations, the frustrating incidents of human-chatbot interactions are seen as one of the key challenges in deploying chatbots in the service setting (van der Goot et al., 2021).
While extant literature has been devoted to the exploration of positive human-chatbots interactions, very limited studies have explored negative human-chatbots interaction (Blut et al., 2021; Chong et al., 2021). Existing studies have suggested that chatbot failures can adversely affect users’ overall experience with a company (Sheehan et al., 2020), harm user trust (de Andrés-Sánchez and Gené-Albesa, 2024) and induce negative word-of-mouth (Seeger and Heinzl, 2021), but it remains unknown how users respond to and cope with negative incidents caused by chatbots. Admittedly, the service failure literature provides adequate guidelines on how customers cope with human-induced service failures. Nevertheless, users interact with and respond to human employees and service robots very differently (Mou and Xu, 2017). Thus, scholars have called for more research to understand customer response to service failures in technology-induced service encounters (Choi et al., 2021). Our research is motivated by this scholarly call and aims to explore (1) common incidents in relation to service failures induced by chatbots and (2) emotions and coping strategies following a service failure induced by chatbots.
In so doing, our study contributes to literature in three main respects. First, our study provides a novel understanding of critical incidents associated with chatbots-induced service failure. This provides valuable insights to service providers regarding how to design their chatbots and integrate them in the service setting. Second, this study responds to the scholarly call for research to explore user reactions following a service failure in the chatbot context (e.g. Blut et al., 2021). It provides a novel understanding of user emotions following service failures induced by chatbots. Third, echoing the existing coping literature, we identify some similar coping strategies users employ following a service failure in human-human interaction. More importantly, we uncover a unique coping strategy that is specific to user responses to chatbot-induced failure, which differentiates our work from those focusing on customer coping with service failures caused by human employees.
The remainder of the paper is structured as follows: it opens with a comprehensive literature review on human-chatbot interaction, service failure and customer coping, and is followed by our research methodology and results sections. The manuscript concludes with a discussion on theoretical contributions and practical implications as well as recommendations for future research directions.
2. Literature review
2.1 Human-chatbot interactions
The existing literature on user interaction with AI chatbots lies in two major streams: the exploration of user experiences and behaviors in their interaction with chatbots and the investigation into the consequences of AI-user interaction (Chen et al., 2022). For instance, studies have compared user experience when they interact with humans and with AI chatbots. Mou and Xu (2017) find that compared to interacting with AI chatbots, users are more open, agreeable, extroverted, conscientious and self-disclosing with human agents. However, Chang (2022) proposes that “AI salespeople” and human salespeople perform differently in different stages of buyer-seller relationships. Specifically, she proposes that “AI salespeople” are more effective in providing information and making initial contacts with users in the early sales process, but human salespeople are more effective in understanding users’ needs in the later sales process and in building up relationships in the purchase process. Hill et al. (2015) discover that when interacting with chatbots, users tend to engage in longer conversations, use simpler vocabulary and express profanity compared to interacting with human employees.
Other studies have also investigated into the contexts where users prefer human employees over AI chatbots or vice versa. For instance, a recent study by Le et al. (2024) suggests that human employee and chatbot collaboration appeals to users as making such collaboration visible to users during a service process can reinforce users’ perceptions of human employee-chatbot cohesiveness and service process fluency, and thereby drives user satisfaction. However, when the situation becomes embarrassing, e.g. obtaining embarrassing products, Holthöwer and van Doorn (2023) unravel that users prefer receiving services from a robot instead of a human employee because a robot helps them feel less judged and thereby overcome their reluctance to accept the offerings.
In addition to the exploration of user experience in human-chatbot interaction, studies in the information systems (IS), marketing and services literature have also examined user acceptance and responses to AI chatbots. For instance, integrating the technology adoption literature, some IS studies have revealed that information quality, service quality, perceived enjoyment, usefulness and ease of use all affect users’ satisfaction and their continuous use of chatbots (Ashfaq et al., 2020). Similarly, some services research also underlines that users value some key features of chatbots, such as their ability to process and analyze user queries and emotions, memorize user preferences and provide personalized recommendations (Chen et al., 2022; Chen et al., 2023). A recent IS study also reveals that chatbots expressed with positive emotions can significantly influence users’ evaluation of a service (Han et al., 2023a).
Anthropomorphic (or humanlike) features of AI chatbots are another key factor that has been widely explored. These features are related to a chatbot’s humanness, such as verbal communication, emotions and feelings (Hlee et al., 2023). However, research in this stream of literature reports contrasting findings. Some studies show that anthropomorphic features of service bots can influence users’ cognitive evaluation process (Yoganathan et al., 2021), and thereby increase positive outcomes. Other studies show that perceived humanness may not yield positive user outcomes, but lead to discomfort and decreased user attitudes toward human-like service bots. For instance, Crolic et al. (2022) highlight that chatbot anthropomorphism can increase users’ pre-interaction expectations, and thereby pose a negative impact on their satisfaction, purchase intention and evaluation about the brand. The inconsistent results can be explained by the fact that the effects of anthropomorphic chatbots are dependent on service contexts (Blut et al., 2021) or type of users (Han et al., 2023a, b). For example, humanlike robots are more effective in dealing with information-processing services but less effective for other services (Blut et al., 2021), whereas anthropomorphic AI robots are preferred by competitive mindset users than collaborative mindset users (Han et al., 2023a, b).
Despite the research progress, scholars have called for more research to understand the dark sides of using chatbots (Blut et al., 2021; Chong et al., 2021). In particular, given that chatbots have constantly been reported to fail to deliver customer services or meet user expectations (see van der Goot et al., 2021), it becomes increasingly relevant to explore how users respond to negative human-chatbot interaction incidents. Thus, our study addresses this issue and aims to explore how users respond to service failure following a negative incident interacting with a service chatbot. The following section explores the literature in relation to service failures, in particular, failures caused by chatbots.
2.2 Service failures and customer coping
According to stress-and-coping theory (Lazarus and Folkman, 1984), following a chatbot failure, customers may experience discomfort and stress, and thereby induce an internalized process to cope with these negative feelings. Such internalized process is called customer coping, which refers to individuals’ response mechanisms to manage stressful situations by utilizing their cognitive, emotional and behavioral resources (Lazarus and Folkman, 1984). Prior studies have proposed various dimensional structures of customer coping (e.g. problem and emotional coping, Lazarus and Folkman, 1987; active coping, expressive support seeking and avoidance, Duhachek, 2005).
Despite the complexity and multi-dimensional nature of coping (Tsarenko and Strizhakova, 2013), customer coping strategies are aimed at changing the distressed person-environment environment that is the source of stress, by either managing the relationship and/or by regulating emotions (Lazarus and Folkman, 1984). Existing literature has explored customers’ negative emotions and vindictive coping strategies following the service failure in human-human interaction (e.g. Gelbrich, 2010). But little is known about customer coping following their negative experience with chatbots. Given that users interact with human employees and service robots differently (Hill et al., 2015; Mou and Xu, 2017), scholars have identified an important research gap in the literature on service failures “in technology-induced service encounters” (Choi et al., 2021, p. 364). Subsequently, there is a lack of empirical evidence on “how consumers experience emotions when the outcome of AI-enabled service is negative” (Pavone et al., 2023, p. 57). It is, therefore, worthwhile to understand users’ responses to service failures induced by chatbots. In this study, we focus on the emotional responses as well as coping of users following a chatbot-induced failure.
2.2.1 Service failure involving service robots
Service failures are inevitable in service encounters, especially when service robots are involved. Given that the involvement of service robots in service encounters is in its infancy, it is not surprising that only a handful of studies have examined the negative incidents caused by interactions with service robots, including chatbots. For instance, Leo and Huh (2020) draw on attribution theory and confirm that users attribute less responsibility toward a service robot than a human agent for service failures because they perceive that a service robot has less control over its tasks. However, they attribute more responsibility toward a service company when the service failure is caused by a service robot than by a human. Similar conclusion is drawn by Belanche et al. (2020) that users perceive services delivered by a service robot as more stable than those delivered by a human agent in the hospitality context (i.e. hotel reception and restaurant waiter service).
Taking a different angle, Choi et al. (2021) explore how users react to robot-induced service failure and recovery and find that users exhibit higher dissatisfaction following a process failure caused by human-like than by non-humanlike robots. This is because users expect greater warmth from human-like service robots (vs. non-humanlike robots) and thereby, may experience higher dissatisfaction following a service failure. Along the same lines, Pavone et al. (2023) highlight that when facing negative incidents caused by chatbots during a service process, customers blame the company more for the negative outcome compared with when they interact with a human agent. This is due to the wide belief among users that chatbots do not have intentions or control over their operations, and thus should not be considered responsible for negative incidents. While these studies have provided valuable insights into the attribution process and user responses to service failures caused by human agents, little is known about their responses to service failures caused by service chatbots.
2.2.2 User responses to service failures
Information systems research regards emotions as an integral element in understanding user interaction with and experience of new technologies (e.g. Gkinko and Elbanna, 2022). For instance, Stam and Stanton (2010) underscore that users’ responses to new technologies are subsequent to their emotional experiences with the technology. Similarly, in the service literature, customer emotions or emotional experience is central to the understanding of their service experience (Mattila and Enz, 2002). When a service interaction successfully occurs, users experience positive emotions and thus respond favorably in the interactions with the provider. On the contrary, when the interaction experience does not match users’ expectations, service failures occur (Gelbrich, 2010). In this case, users are likely to experience negative emotions (e.g. anger, frustration and helplessness) and thereby enact customer coping (Bougie et al., 2003). Such negative emotions can lead to negative consequences, such as vindictive negative word-of-mouth, complaints (Bougie et al., 2003; Strizhakova et al., 2012), and customer revenge (Grégoire et al., 2010).
Due to the negative consequences associated with users’ negative emotions, both IS and service literature has devoted to understanding the triggers of negative emotions in the service context. For instance, the service literature has uncovered some negative emotions following negative interaction with service providers or technologies, such as anger, frustration, worry, anxiety and helplessness (Gelbrich, 2010; Wang et al., 2017). However, it is worth noting that emotions are context- or technology-specific. In other words, when studying emotions in human-technology interaction, scholars need to focus on a particular technology and relate emotions to the contexts in which the technologies are implemented or used (Steinert and Roeser, 2020). Given limited studies have been conducted on chatbot-human interactions in a service setting, this highlights a need for research to understand users’ emotional responses to negative interactions with chatbots in the service context.
In summary, there is a clear need for research on users’ responses to the negative incidents caused by human-chatbot interaction during a service process (e.g. Blut et al., 2021; Chong et al., 2021). Specifically, more research is warranted to understand customer coping in negative human-chatbot interactions (Pavone et al., 2023). Methodologically, the IS literature has “a strong bias towards quantitative methods” (Gkinko and Elbanna, 2022, p. 1720), and thus needs more qualitative studies to understand the nature and nuanced dynamism of emotional and behavioral responses to human-technology interactions. Qualitative methods allow researchers to gather rich contextual information as well as delve into complex and diverse usage scenarios, leading to a comprehensive understanding of user-chatbot interactions. It also helps extend the existing theories in a new context and identify new factors. Thus, our study uses semi-structured interviews to explore the service contexts in which chatbots are commonly used, identify common incidents in relation to service failures induced by chatbots, and uncover new emotions and coping strategies following a chatbot-induced service failure.
3. Methodology
3.1 Research design and data collection
This qualitative-exploratory study adopted an inductive approach to identify (1) common chatbot-induced service failures, and (2) emotions and coping strategies for such failures (Myers, 2019). Semi-structured interviews were conducted to explore user experience of different chatbot-induced service failures and their development of effective coping strategies. An interview guide containing a set of open-ended questions and associated prompts in a logical sequence was used to ensure internal consistency and comparability, and to minimize possible interviewer effects. The interview guide was further developed, pilot-tested, and revised accordingly prior to data collection. We rephrased a few questions to ensure clarity to participants and included prompts as an alternative way of asking the same question.
Given the exploratory, inductive nature of this study, questions were drafted with the flexibility to change the order of questions, ask follow-up questions or rephrase questions when required. At the beginning of the interviews, we asked participants to talk about their overall experience with chatbots. A few generic questions were used to open up the conversation and build rapport with participants. For example, can you tell me about your recent experience using chatbots? How often do you use chatbots? Participants were then asked about their purpose for using chatbots, the service failure incidents using chatbots, the emotions triggered by the chatbot service failure, and their reactions to the failures.
Criteria sampling was used to select participants who can recall their recent experience of interacting with chatbots (Patton, 1990). We recruited participants who used chatbot services in the past 6 months and experienced negative incidents interacting with chatbots. Participants were recruited via a pre-interview survey on Prolific, a widely used data collection platform in the UK This pre-interview survey contained screening questions including self-reported chatbot service incidents, their digital competence and general scenarios of using chatbot services and was made available to a population above 18 years old in the UK that were registered on Prolific. 208 people responded to the survey, among which 60 respondents indicated their willingness to participate in a follow-up interview. These respondents were then approached by our team within one week with an invitation to arrange an online interview via Microsoft Teams. At the end, 23 respondents agreed and were interviewed. The interviews were semi-structured and conversational in nature, taking place between January and March 2022.
The profile of Interview participants is summarized in Table 1. The participants’ ages ranged from above 18 to the late 50s, and most participants had a university degree. Their occupations have covered a wide range of professions and industries. The interviews lasted from approximately 0.5 to 1.5 h (see Table 1), where we saturated themes, noting recurrent patterns. All interviews were digitally recorded and transcribed via Microsoft Teams with the consent of the interviewees. All draft interview transcripts were then examined and edited by researchers to ensure accuracy.
Participant profile and interview duration
| Pseudonyms | Age group | Gender | Occupation | Education | Recently used chatbot services | Digital competence* | Duration (min) |
|---|---|---|---|---|---|---|---|
| Ava | 30–39 | Female | Researcher | Master | Transport, hospitality | 2.4 | 30 |
| Max | 50–59 | Male | Hospitality | High school and lower | Retail | 4.6 | 38 |
| Sophia | 20–29 | Female | Student | Master | Retail | 4.6 | 46 |
| Isabella | 20–29 | Female | Unemployed | High school and lower | Energy | 3 | 47 |
| Mia | 20–29 | Female | IT specialist | Undergraduate | Telecommunication, delivery | 4.8 | 60 |
| Ethan | 30–39 | Male | Police staff | Undergraduate | Finance | 4.2 | 52 |
| Emily | 40–49 | Female | Career support adviser | Undergraduate | Retailing | 3.6 | 116 |
| Harper | 20–29 | Female | Student | Undergraduate | Transport, tourism, real estate, retail, Finance | 4.8 | 47 |
| Jackson | 20–29 | Male | Student | Undergraduate | Delivery, retail | 4 | 44 |
| Oliver | 40–49 | Male | IT manager | Undergraduate | Telecommunication | 4.8 | 58 |
| Amelia | 30–39 | Female | Self employed | Undergraduate | Retail, transport | 3.6 | 46 |
| Benjamin | 20–29 | Male | Duty manager | Undergraduate | Not specified | 4.4 | 59 |
| Olivia | 20–29 | Female | Activity instructor | Undergraduate | Public service, finance, retail, energy | 4.2 | 41 |
| Chris | 30–39 | Prefer not to say | Unemployed | High school and lower | Technology, retail, healthcare, telecommunication | 3.4 | 43 |
| Evelyn | 30–39 | Female | Nurse | Master | Retail | 2.6 | 26 |
| Samuel | 40–49 | Male | Support engineer | Undergraduate | Technology | 4 | 47 |
| Lily | 18–19 | Female | Retail | Undergraduate | Retail | 3 | 29 |
| Alexander | 20–29 | Male | Hotel manager | Master | Insurance, telecommunication | 4.8 | 27 |
| Noah | 30–39 | Male | Health care worker | Master | Retail, energy | 4.6 | 36 |
| Daniel | 20–29 | Male | Teacher | Undergraduate | Telecommunication, technology | 5 | 49 |
| Chloe | 50–59 | Female | Lecturer | Doctorate | Retail, telecommunication, finance | 3.6 | 47 |
| Mason | 40–49 | Male | IT project manager | Master | Telecommunication, retail | 5 | 104 |
| Lucas | 40–49 | Male | Librarian | Master | Telecommunication, finance | 4.8 | 46 |
| Pseudonyms | Age group | Gender | Occupation | Education | Recently used chatbot services | Digital competence* | Duration (min) |
|---|---|---|---|---|---|---|---|
| Ava | 30–39 | Female | Researcher | Master | Transport, hospitality | 2.4 | 30 |
| Max | 50–59 | Male | Hospitality | High school and lower | Retail | 4.6 | 38 |
| Sophia | 20–29 | Female | Student | Master | Retail | 4.6 | 46 |
| Isabella | 20–29 | Female | Unemployed | High school and lower | Energy | 3 | 47 |
| Mia | 20–29 | Female | IT specialist | Undergraduate | Telecommunication, delivery | 4.8 | 60 |
| Ethan | 30–39 | Male | Police staff | Undergraduate | Finance | 4.2 | 52 |
| Emily | 40–49 | Female | Career support adviser | Undergraduate | Retailing | 3.6 | 116 |
| Harper | 20–29 | Female | Student | Undergraduate | Transport, tourism, real estate, retail, Finance | 4.8 | 47 |
| Jackson | 20–29 | Male | Student | Undergraduate | Delivery, retail | 4 | 44 |
| Oliver | 40–49 | Male | IT manager | Undergraduate | Telecommunication | 4.8 | 58 |
| Amelia | 30–39 | Female | Self employed | Undergraduate | Retail, transport | 3.6 | 46 |
| Benjamin | 20–29 | Male | Duty manager | Undergraduate | Not specified | 4.4 | 59 |
| Olivia | 20–29 | Female | Activity instructor | Undergraduate | Public service, finance, retail, energy | 4.2 | 41 |
| Chris | 30–39 | Prefer not to say | Unemployed | High school and lower | Technology, retail, healthcare, telecommunication | 3.4 | 43 |
| Evelyn | 30–39 | Female | Nurse | Master | Retail | 2.6 | 26 |
| Samuel | 40–49 | Male | Support engineer | Undergraduate | Technology | 4 | 47 |
| Lily | 18–19 | Female | Retail | Undergraduate | Retail | 3 | 29 |
| Alexander | 20–29 | Male | Hotel manager | Master | Insurance, telecommunication | 4.8 | 27 |
| Noah | 30–39 | Male | Health care worker | Master | Retail, energy | 4.6 | 36 |
| Daniel | 20–29 | Male | Teacher | Undergraduate | Telecommunication, technology | 5 | 49 |
| Chloe | 50–59 | Female | Lecturer | Doctorate | Retail, telecommunication, finance | 3.6 | 47 |
| Mason | 40–49 | Male | IT project manager | Master | Telecommunication, retail | 5 | 104 |
| Lucas | 40–49 | Male | Librarian | Master | Telecommunication, finance | 4.8 | 46 |
Note(s): * measured on a five-point Likert scale
Source(s): Authors’ own work
3.2 Data analysis process
Thematic analysis was applied to identify emerging codes and themes from interview transcripts (Braun and Clarke, 2006). The principle of thematic analysis is to capture external heterogeneity across themes and sub-themes, and internal homogeneity within themes and sub-themes. This study applied an inductive approach to identify factors of chatbot-induced service failures and how users subsequently cope with the failures and their negative emotions.
The first coder iteratively read and coded all interview transcripts using Nvivo 12. Such an inductive analysis process included three main phases. Phase one was open coding, which allowed the identification of 58 initial codes. The main purpose of phase two is to develop categories. Codes were organized under their representing themes by renaming codes with similar meanings, merging codes with similar characteristics and discarding irrelevant codes. For example, within the code of “understanding of user emotions” participants mentioned chatbots have problems to understand or relate to their challenging circumstances and emotions derived from such circumstances. Therefore, the code was renamed as “lacking empathy and sympathy” as more appropriate. The codes of “personal customer service” and “human-like characteristics/features” were combined as one single code which captures participant’s perception of a chatbot’s human-like characteristics/features, such as having a human name, using emoji in its replies. The codes of “personalized response” and “irrelevant responses” were also merged as one code which describes chat’s capability of providing relevant and accurate responses to the users. This phase led to the emergence of three main themes, namely “usage context,” “chatbot-induced service failure” and “emotion and coping.” Phase three allowed the research to go through all codes and themes to compare and abstract their meanings. The naming of themes and codes was revised and redefined in this phase to ensure internal homogeneity and external heterogeneity (Miles et al., 2014).
In addition, to ensure the reliability of the research, inter-coder agreement checks were conducted at the end of each phase (Braun and Clarke, 2006). The discrepancies in coding were discussed among the authors who are knowledgeable about the subject matter, and agreed revisions were inserted in the codebooks before entering the next phase of coding. For example, after the first phase of coding, the first coder based on the updated version of codebook reviewed all quotes within each code to ensure all quotes are accurately coded and consistent with the codebook. The same procedure was also performed in the second and the third phase of the thematic analysis process. An analysis codebook featuring themes and codes derived from the data was developed and the coding heretical levels are illustrated in Figure 1.
4. Results
4.1 Chatbot usage contexts in a service setting
Derived from the analysis of 23 interview transcripts, we observed two main contexts of using chatbots in the service process, in which users would (1) proactively choose a chatbot to fulfill their timely service requests and conduct simple search task; and (2) passively use chatbots as no alternative options are available to provide customer support.
Participants appreciated instant and accurate chatbot responses over some classic and simple queries, queries that could be easily answered by FAQs or were associated with account-related information. In this case, chatbots were used to direct users to relevant information pages for straight-forward search tasks. Furthermore, chatbot users who requested instant answers or/and timely access to the customer service were frequently observed. Therefore, chatbots as an alternative to human agents were chosen to provide quicker customer service, ideally during out of office hours or to avoid long wait over the customer service telephone line.
“The chatbot is providing very basic information, and I suppose I wouldn't be the kind of customer who'd contact customer service over something really simple. It's only if I have an actual problem (Mia).
On the other hand, users chose to use chatbots as there were no other options. Chatbots were deployed as the only method of customer support following the company’s vision to reduce the cost of human employees and mitigate staff shortage. This led to users’ passive adoption of chatbot service whenever they needed customer support:
[I am] expecting to talk to someone who can act access my account and help me with what I'm doing. But a lot of the time I end up using a chat bot because there's no other option (Mia).
4.2 Chatbots-induced service failures
While users appreciated the necessity of using chatbots as part of the customer service, they perceived chatbots as underdeveloped. Participants showed clear negative perception of chatbots as a replacement of human agents:
I understand the necessity (of a chatbot), that's kind of my conclusion. [It is] maybe a necessary evil because I know there are not enough people in the world to answer everybody's customer service questions I'm sure they could be designed better. (Ava)
Reflecting on their negative experience, our participants reported five predominant scenarios of chatbot-induced service failures: (1) comprehension and provision of information (e.g. inability to comprehend or provide relevant information), (2) over-inquiry of personal information, (3) fake humanity (e.g. lack of empathy, fake human-like characteristics, pretentious personalized responses), (4) poor integration with human agents, and (5) incapable of solving complicated problems (e.g. stuck in a loop, worsen the situation).
Chatbots’ comprehension and provision of information: Many service failures emerged from the interview data were associated with information-related issues, e.g. chatbots’ inability to comprehend user-inputted inquiries, and their inability to provide required and relevant information. Participants sometimes perceived the provision of information unhelpful, generic, and inaccurate when their inquiries were complex and fall out of standard procedure (see Figure 1). Samuel commented that chatbots were intelligent only when there are readily available answers to user inquiries. For example, in the case below, there was an article for the inquiry of refund. Chatbots were able to direct Samuel to the relevant information. But they were not capable of solving problems when inquiries were ambiguous, such as queries with spelling mistakes or those with no pre-defined answers. Chatbots have limited capability to offer a solution over a more complex and less common query.
Sometimes they make mistakes […]. Because it's based with [keyword] tags if there was a spelling mistake and the robot was not trained to understand a spelling mistake then they would not understand the world like you know (Samuel)
Additionally, some participants were not satisfied with chatbots because they believed chatbots were doing a repetitive job. A user received the same information as he could get from the help section of the service provider’s website which includes a list of frequently asked questions. Chatbots were simply doing what the participant could do by himself. Hence, it is not “massively helpful” described by Benjamin. In addition, the accuracy of a chatbot response was a big concern for the users. According to Emily, she struggled in the chatbot service encounter as it “didn't give the right information.” Participants were also fed up with the excessive promotional information received from the chatbot interaction. These cases showed that chatbots failed to instill assurance of service quality among users.
[…]But I don't need to click on the chat button, go to ask a question and then just be fed all these random pop ups and things like that and it seems too much and it kind of puts you off the experience […] It almost seems like you're desperate for me to book with you and I don't. Yeah, it doesn't seem … accurate or safe? (Harper)
Chatbots’ over-inquiry of personal information: In addition to their inability to provide useful information, participants blamed chatbots for asking for excessive personal information, which was neither helpful nor relevant to the users’ inquiries. Chris shared her negative experience with a chatbot of a technology service provider. The request for the chatbot was as simple as seeking for a battery replacement. But before she could find out the answer or talk to a human agent, the chatbot “forced” her to answer questions about her personal details, including contact information, email address and her past purchase with the website. As she compared it with the in-store shopping experience, one could easily get a battery replacement without responding to these questions. But when it comes to communication with the chatbot, she had to pass all these checks before getting a response to her focal inquiry. These details, according to Chris, were not necessary.
[It is] kind of forced me into answering questions. […] They wanted your email address, your name, your product references and all this kind of stuff. And I just had a simple question. They don't need all this information. It's not like I'm going to go into the shop. (Chris)
Fake humanity: Previous literature suggest that anthropomorphic characteristics are key selling points of chatbots as they are meant to be more agreeable and approachable (Hlee et al., 2023). However, our findings revealed the opposite, as some unpleasant experiences with the chatbot could be induced by these characteristics. Participants described these features as “clunky, slow and a bit fake” (Oliver) and “preten[tious]” (Mason) (see “fake humanity” in Figure 1). This was another widely acknowledged cause of service failures induced by chatbots. Participants criticized that the chatbot-user interaction had no emotional exchange and lacked “nonverbal communication” (Samuel). When they approached chatbots and sought help, they could hardly receive sympathy from the chatbot with their difficulties. This was particularly prominent in finance, insurance and healthcare services, where users would require more emotional support in addition to routine service.
there is a different level of interaction[…] often if I want to contact the bank or mortgage or technology [related customer service], it's because something has gone wrong. So I'm emotionally … in quite a difficult state […] you want to feel that someone has sympathy and they act for you because of human sympathy (Chloe)
The problem of fake humanity was also evident in the lack of two-way interaction between users and chatbots. This problem intensified the difficulties of user communication with chatbots and led to the failure of resolving user queries. As shown in the example of Sophia, when she wanted to have an in-depth discussion of her health conditions, the chatbot could only give either a yes or a no answer.
The chatbot had ….like … either a yes or no answer or something like that, and I couldn't persuade a computer, whereas I felt like if I spoke to a human, maybe I could, like, persuade them. And so I think I gave them more of an emotive story - it is a chronic condition which really affects me. And I've been told before that I could have (the treatment) covered by the insurance and the doctors recommended that I have this treatment. I felt like I could be more emotive (in this circumstance), where a human may be persuaded by them (Sophia)
Poor integration with human agents: Poor integration with human agents is another cause which could induce service failure, echoing the findings from Fan et al. (2023). Users experienced a low level of efficiency and responsiveness in chatbot interactions. This was due to constrained access to human agents and limited information shared with a human agent. As a result, when transiting between different service support channels, questions were asked to users multiple times, causing delays in the service process and confusion in response. The issue of disconnection between chatbots and human agents was frequently observed in many cases:
My expectation would be like whenever they're designing this system they present the information already given to the customer service agents. And I get the impression that there's a disconnection there (Mia).
It was repetitive. […]when you contact (customer service), they should have put it somewhere that you know this person has issues with the order and a refund. There wasn't any update on the case or the file […] so there wasn't any communication whatsoever. So I had to start from scratch again and trace them again. (Emily)
This was even worse if chatbots were the only channel for receiving customer service. It was exceptionally frustrating when users tried to contact the company for the first time, whereas there were “no other options” than the chatbot (Mia). To this end, chatbots were perceived as a barrier to actual customer service.
You know, especially when the chatbot is clearly there to avoid you getting in contact with customer service. (Mia)
Incapable of solving complicated problems: The chatbot-induced service failure was also observed in its incapability of solving complicated problems. Users were easily stuck with simple answers and kept in an unhelpful service loop, like Ava’s story with an airline chatbot:
[T]he bot would just do the thing send me round in circles so it will give me like two options and I will try each but neither would work and so will just send me back to the beginning and I did that a few times (Ava)
Other than a loop, users suffered from a “dead end” (Isabella) in their interactions with chatbots because the chatbots could not investigate the problem or assess the situation to provide a personalized response. These issues would bring in more problems. In many cases, when users already knew none of the given options could help, they had no choice but to try these options as this was the only way that the chatbot would proceed.
Additionally, chatbots’ incapability sometimes made the concerning situation even worse. It was described as “adding an insult to injuries” (Samuel). The unhelpful chatbot-user interaction was a waste of time when there could have been a direct and simple way to provide the information. Mason described chatbots as “a false front door on the provision of information.” This was even worse when chatbots brought in additional work to users and complicated the problem-solving process rather than providing users services and support. Oliver echoed a similar perception of how the adoption of chatbots has affected the customer service provision:
(the use of a chatbot) is worse because I think when a company makes you work to get what you want. That feels particularly bad. (Oliver)
This bad feeling of “making you work to get what you want” was observed in many cases. Oliver has been suffering when he talked to the chatbots to deal with his pension and to fix the issues in the Internet.
[the pension service]they're making you jump through these hoops and go to the chatbot and fill in the stuff and type responses and play with it for a bit and then they say ‘Oh no, no, you've got a phone, this phone number’. So then you have to phone the phone number and I feel like I'm working for them. I feel like I'm having to work to get what I need and paying them to do it. (Oliver)
(if not speaking to a chatbot), you don't have to go through this- Have you tried turning it [the hub] off and on again six times? (Oliver)
Similar complaint was found in Mason’s example when he dealt with the faulty product. The unnecessary questions from the chatbot gave him additional work to understand the logic and filter the noise before he could get an answer.
I have to go through a lot of, like a unnecessary questions and kind of filtering as part of that logic that sits behind the system where if I just had a list, I would find it.[…] I really, fundamentally disagree with the whole. (Mason)
4.3 User emotions following chatbots-induced service failures
In line with Pavone et al. (2023), we found that the participants seldom attributed these service failures to chatbots or blamed chatbots, though they often embark on emotions such as anger, frustration and feeling of betrayal.
Anger. Anger was the most frequently reported emotion as a result of service failure in chatbot usage. Emily described the unpleasant experience of using a chatbot for a refund. When she had to repetitively get into contact with the chatbot and provide the same information, this was the moment she started to get angry.
So it took me about 40 hours […] That was really sort of angry. (Emily)
Interviewees had an expectation that the chatbot was capable of remembering basic user information from the previous contacts and hence was convenient to communicate with. However, when chatbots could not meet the expectations, this negative emotion would also be triggered.
It just means that you have to sort of be a little bit more switched on in your own brain, and it sounds really lazy, but you're doing it for ease of use. So it's annoying when something was supposed to be easier and faster to use, that you have to sort of engage your mind a little bit more to use it.” (Olivia)
Frustration. On many occasions, chatbot users would experience frustration, when the service chatbot could not fulfill the tasks. Unlike the incidents triggering anger, frustration was often incurred when users were able to receive responses from the chatbot service but then realized that the responses were not very helpful. It made “the entire process pointless” (Mia). In other cases, requests might be handled correctly. But frustration was triggered due to the inefficiency of the service, for example, making the user wait too long for the service:
It took me about 40 hours, really frustrated. […] And after that, I had to call. I think few days later, as they waited and then I complained about the chatbot is not effective. It's not right. (Emily)
User frustration could also occur in other common chatbot-induced service failures, as identified previously. Chloe described how her frustration intensified over time due to chatbots’ “Fake Humanity” of pretending to be polite and patient listener.
It's continual polite patience … it (the chatbot) seems to increase ones frustration in a way, but because it's almost as if … nothing will upset it. Nothing will ruffle its feathers. Whereas we are like and it makes it worse somehow. (Chloe)
Betrayal. Betrayal was also observed when participants talked about service failures induced by chatbots. Although there was no ‘fairness violation’ (Grégoire and Fisher, 2008) leading to the sense of betrayal in the chatbot service encounter, users still experienced the “love becomes hate” effect (Grégoire and Fisher, 2008, p. 250) when a chatbot failed to act in a desirable way. This was always related to a chatbot’s anthropomorphic features. In the interview with Chloe, she expressed a favorable attitude toward the AI-enabled service in general. She agreed that there was a certain amount of pleasure in a service bot like Siri or Google Home. But the joy of teasing the service bot only existed when she entertained herself by talking to it. When it pretended to be a real person in the service encounter, the feelings were changed.
[…]but maybe they shouldn't kind of pretend that you're interacting with a human[…] they always would say something like hi, my name's Julie and at first I thought it was real people and then, you know, slowly came to realize you know, more or less that it wasn't, and there's a certain sort of sense of betrayal (Chloe).
Failure to disclose the identity of chatbots led users to feel betrayed by the service provider. The feeling of being fooled by a chatbot was also evoked when the chatbot only replicated the imitation function of human contact. Rather than pretending to be intimate and friendly like a real person, users wanted the company to be honest about the identity of their service agents (e.g. chatbots).
Passive Defeat. Users could also feel defeated by the chatbot service. This emotion was unique to the context of chatbots-induced service failures. On one hand, the feeling of defeated was unlike other widely studied negative emotions in human-induced service failures, such as anger or helplessness, which can often lead to a desire for revenge (Tripp and Grégoire, 2011). On the other hand, it is different from the typical self-defeating emotion, which is negative, self-destructive and potentially leads to illogical, irrational, unrealistic behaviors (Ellis, 2013). The passive defeat emotion emerging from the interview data was negative in the emotional valence while being neutral in the sense that users do not take vengeful actions. Instead, users with this emotion reported an intention of “angry resignation,” according to Mason:
[…]but with a computer, you know there's no chance you've, you go through the loop twice. You know you've explored all the options. It's not like it's not the same kind of is like ….Uh-huh. I know this won't work. I don't know. There's no. There's not another option. There's nothing I can do. (Mason)
Overall, users demonstrated more than one type of emotional responses when they tried to resolve their enquires with chatbots. Such emotional responses were frequently mentioned throughout interviews.
4.4 Customer coping strategies in chatbot service
Following these emotions, interviews would adopt four types of coping strategies to cope with chatbots-induced service failures, namely, expressive support seeking (venting emotions), active coping (contact human agents, complaint, and feedback), acceptance and withdrawal.
Expressive support seeking. After experiencing negative emotions, users would like to seek emotional support. It was different from the emotional venting strategy in the context of human-to-human interactions, where users could let their feelings out to the service employees. Chatbot users chose to vent their negative emotions, e.g. anger and frustration, only to their close families and friends, as what Amelia did with her husband and mother.
[…] if I'm frustrated about things or kind of vice versa, then we'll kind of talk it through or he'll listen to my frustrations and stuff. And yeah, rather than me battling it up[…] I'll call my mum and my mum does the same to me. […] I just wanna be able to speak to someone and someone who can listen. Or who will agree with you. Yeah, it does kind of help me to vocalize my frustration rather than just keeping it in. (Amelia)
Users believed that it was meaningless to vent emotions to a computer. When adopting the emotional venting strategy, users wanted to see their emotions being taken care of. Chatbots created a problem, while it had no capacity of solving the problem or understanding the user’s emotions. Therefore, when dealing with chatbot-induced service failures, the emotional venting coping is irrelevant to the service provider or other parties involved in the service encounter.
Active Coping. We also observed active coping from our participants. Users would actively engage in actions to solve problems or assuage sources of negative emotions. This type of coping was normally problem-focused. Users could give complaints and feedback to the chatbots through the automated rating system or by contacting human agents. However, this was not the best approach, according to our participant, as the feedback center always failed to attentively deal with the complaint or provide reasonable advice:
They would say how would you rate our like experience or whatever[…] I may or may not, you know, give them a rating or whatever, because there is an option to give a rating, or maybe you can just close the chat. So I think all they said was ‘sorry to hear that. What is your query or whichever?’[ …]. (Lucas)
Acceptance. Not all chatbot users coped with the negative emotions vindictively. Some users would decide to accept the fact that the chatbot service failure has happened and nothing can be changed. In this case, they tried to rationalize the chatbot failure and acknowledged that the chatbot was not to blame for the problem. This was often observed when users had good knowledge of chatbots or had work experience in customer service. In these cases, the users showed a greater understanding of and higher tolerance for the failure:
(I) know why they're using chatbots, so it's not their fault. They try to be present for their customers on a 24/7 basis, so that's why they're using the chatbot to answer general questions. But if my question is too complex and I understand the robot, I can't take it into consideration. I can't say well … no, but your robot is stupid because I'm asking too much of the robot, you know, so I it's fine. I'm fair. (Jackson)
Withdrawal. In line with previous studies on coping with service failure, users reacted negatively to a service failure when they believed the service provider could have easily prevented it. We observed that participants in our study would avoid using chatbots because of the unhelpful experience previously. Interestingly, they did not just avoid the service or brand where the chatbot belonged, but other brands and companies that also adopted chatbots in customer service, regardless of their past experience with them. For example, Emily got annoyed by her experience with a chatbot from a fashion brand. She decided to withdraw from the contact with the chatbot and refused to use other brands immediately after she discovered that the other brands also adopted a chatbot in customer service. This “transferable withdrawal” is further depicted as follows:
As sometimes because of my experience with [the chatbot of a fashion brand], when I do online shopping with other stores, when I see a chatbot, I immediately remove myself because I get annoyed. I feel it's going to be the same thing happening […] because of my experience with the fashion brand chatbot. (Emily)
5. Discussion and conclusions
Companies have been increasingly relying on chatbots to connect with and provide services to customers. Our study aimed to understand the “dark side” of human-chatbot interactions by exploring common service failure incidents caused by chatbots and users’ subsequent emotional responses to and coping with such failures. Drawing on 23 interviewees, it unravels 5 common chatbots-induced service failure incidents and explicitly identifies the emotional and behavioral responses of users following the service failure incidents. Thus, the study makes several theoretical contributions in the IS and services literature.
Theoretical contributions. Our study identifies five common chatbots-induced service failure incidents, including their inability to comprehend and provide information, over-inquiry of personal information, fake humanity, poor integration with human agents, and limited ability of solving complicated problems. While some of the service failure incidents (e.g. inability to provide accurate information or inability of solving complicated issues) have been mentioned in industry reports and previous studies (e.g. Elliott, 2018; Gkinko and Elbanna, 2022; Pavone et al., 2023), our study uncovers two new scenarios – chatbots’ over-inquiry of personal information and false humanity – which have not been fully understood in the existing literature. Thus, this study contributes to the IS-service interface by extending our current understanding of the negative side of human-chatbot interactions in various service settings.
Second, responding to the call for research to understand user emotions in human-chatbot interactions (e.g. Crolic et al., 2022), our study identifies users’ predominant emotions following their negative interactions with chatbots in customer service. These emotions include anger, frustration, betrayal and passive defeat. In congruence with the existing literature on emotions in human-induced service failures, our study reveals that anger (Bougie et al., 2003), frustration (González-Gómez et al., 2021), and betrayal (Grégoire and Fisher, 2008) are primary emotions that users experience after a chatbot-induced service failure. Moreover, the qualitative interpretive approach used in this study also contributes to the IS literature as it overcomes “the dominance of essentialist assumptions and positivist research” (Gkinko and Elbanna, 2022, p. 1733).
In addition, we also identify a new emotion – passive defeat – which is similar to helplessness in a sense that it occurs when users feel that the problem is irreversible or cannot be resolved (Lazarus, 1991). But it is different from helplessness because helplessness may lead to users’ higher desire for a revenge (Obeidat et al., 2017), whereas passive defeat leads to users’ passive acceptance of the incident without a desire for revenge. This may be because when facing a service failure, users usually attribute less responsibilities toward a service robot, relative to a human employee (Leo and Huh, 2020). As a result, users run into a “dead-end” in the attribution process, resulting in passive acceptance of the incident rather than a more proactive approach of revenge.
Third, integrating stress-and-coping theory into the human-chatbot interaction literature, this study uncovers that, similar to a human-induced service failure context (see Duhachek, 2005), users in a chatbot-induced service failure context tend to adopt similar coping strategies, such as expressive support seeking, active coping, and acceptance. However, subtle differences can be observed in expressive support seeking. Specifically, different from expressive support seeking strategies in human-to-human interactions, where users vent the negative emotions to service providers or close ones (Duhachek, 2005); in human-chatbot interactions, chatbot users can only turn to close ones to vent their negative emotions because users feel meaningless to vent to chatbots or companies using chatbots (Pavone et al., 2023).
Moreover, our study extends stress-and-coping theory by identifying another coping strategy – withdrawal – which occurs when users withdraw from the interaction with the chatbot causing the service failure and subsequently avoid using all other chatbots. This transferable withdrawal, which is new to the IS and coping literature, can be explained by stress-and-coping theory, which suggests that users may avoid stressors (i.e. chatbots in our case) for emotion regulation purposes (Lazarus and Folkman, 1984). This implies that customer coping strategies in the chatbot-induced service failure context can be different from those in the human-induced service failure context, further supporting the necessity of our study. As such, our study contributes to both coping and service management literature by advancing our understanding of customer coping in a technology-charged service context.
Practical implications. This study contributes to practice by providing service providers and chatbot designers with useful recommendations for the improvement of chatbot functionalities. First, our study advices that while users utilize chatbots for different tasks, companies need to understand the types of service tasks that their chatbots are capable of handling and then tailor the design of their chatbots to cater for different user demands. For instance, if companies mainly use chatbots to provide standard, repetitive, routine tasks, such as information provision, they need to increase chatbots’ capacity of identifying, processing, and providing useful information. If companies aim to use chatbots to provide more advanced or personalized services, they need to adopt mechanical AI or thinking AI (see Huang and Rust, 2021 for details) to increase efficiency and avoid service failures.
Nevertheless, when a chatbot reaches its limit in communicating with users or helping with their queries, an immediate solution (e.g. transfer user queries to a human agent) needs to be provided in order to mitigate their negative response. This highlights the necessity of better integration between service chatbots and human agents. For instance, Libai et al. (2020) proposes that users should be given the option to be handed over to a human agent once they feel that the chatbot is no longer helpful. An automatic call distribution (ACD) solution which facilitates the technical hand-over from chatbots to a human agent is recommended. This can also potentially reduce the privacy concerns of users as users can feel intruded when chatbots ask for too much personal information. Moreover, chatbots are not perceived as warm or sympathetic as human agents and thus are not effective in handling queries that require human touch. As a result, we recommend that human employees are needed at service touch points where users are in a stressful state, particularly in financial, healthcare and insurance services, to compensate for potential drawbacks of using chatbots.
Limitations and future research agenda. This study adopted an interpretive research philosophy to explore the dark side of chatbots in various service settings. It is warranted to apply quantitively research methods to further validate our exploratory findings. In addition, this study only focused on respondents in the UK without taking into consideration cultural differences which may cause unintended bias in users’ expectations toward chatbots and their responses to service failures induced by chatbots. Likewise, previous literature infers that social structures (Moschis, 2007) and personality (Mick and Fournier, 1998) lead to different choices of coping strategies. Therefore, future research should extend this research by including these situational factors and explore how cultural drivers, social classes and personalities can influence users’ responses to and coping with chatbots-induced service failures.
The IS literature highlights that the study of emotions is very context-specific and technology-specific (Steinert and Roeser, 2020). For instance, it may be worthwhile to explore if service failures can induce different emotions and experiences depending on the types of chatbots involved. Service failures from a health chatbot may lead to more negative experiences than those from a sales chatbot. Future studies can also focus on financial, healthcare and insurance service contexts, where users are in an unstable emotional state or have complex demands for support when initiating interactions with chatbots can offer nuanced and insightful recommendations for enhancing chatbot designs. Another interesting research context is public sector and government institutes (Verne et al., 2022). Future research can explore citizen responses to stressful chatbot interactions when using chatbots for public welfare queries or other public contacts, as well as identifying chatbot failures that may have negative consequences to the economy, and the wellbeing and civic rights of human being.
We see our study as a starting point for research on the dark side of human-chatbot interactions in business-to-consumer (B2C) domains. While AI implementation in B2C sectors is oriented toward creating personalized customer experiences, the business-to business (B2B) sectors employ AI mainly in internal business functions (Kushwaha et al., 2021). Given difference in the nature of the two business and the wide application of AI technologies and chatbots in both areas, more studies are needed to understand the limitations and negative incidents associated with the use of chatbots in B2B areas.
Funding: This project is funded by Trinity College Dublin Arts and Social Sciences Benefactions Fund 2021-2022.

