This study aims to critically evaluate the applicability of generative artificial intelligence (GenAI) tools for academic research in international business (IB), specifically focusing on the topic of firms’ nonlinear internationalization. It assesses these tools’ key performance dimensions: correctness, hallucinations and thoroughness.
This research adopts an exploratory approach, examining a comprehensive set of GenAI tools: eight chatbots and four AI-driven applications designed for academic purposes. The evaluation focuses on the capabilities and limitations of these tools in generating accurate research-related content for IB scholars.
This study finds that while GenAI tools capture some aspects of nonlinear internationalization, they often produce partially accurate and/or biased results. Common issues include providing fictitious sources, incorrect publication data and vague or incorrect answers. Thus, substantial development is still needed for GenAI tools to become reliable for scientific research.
Researchers should use GenAI tools with caution, verifying the accuracy of generated content and citations independently. A cautious approach is crucial to maintain the integrity and quality of academic research.
This study raises awareness about ethical and practical challenges of using AI in academia, including issues related to plagiarism and misinformation. It underscores the importance of critical evaluation when using GenAI tools for research.
This paper contributes to the emerging literature on the role of GenAI in academic research by providing a critical assessment of the usability and limitations of current tools in studying complex IB phenomena. By using nonlinear internationalization as an example, it demonstrates how GenAI may support or hinder IB scholarship.
1. Introduction
The usefulness of generative artificial intelligence (GenAI [1]) applications has greatly expanded since late 2022 when OpenAI’s ChatGPT was released for public use, resulting in a wide variety of GenAI chatbots, applications and tools being piloted and launched within the past years. All the while, many of the existing online tools have added AI functionality by incorporating features made possible by the application programming interfaces offered by ChatGPT and others. However, international business (IB) research has mostly neglected their impact on the field (Benmamoun, 2025), although they are found increasingly useful from both empirical (Chen et al., 2022; Rossi et al., 2024) and theoretical (Gaessler and Piezunka, 2023; Mariani and Dwivedi, 2024) perspectives.
Moreover, the current research on the usefulness of GenAI applications is mixed. Several scholars have given positive feedback about their potential. For instance, according to Burger et al. (2023, p. 233), “AI already has the potential to make researchers’ work faster, more reliable and more convenient,” while Lahat et al. (2023, p. 3) stated that GenAI tools have “the potential to be a valuable resource for researchers.” On the other hand, scholars are also worried about the ethical implications of GenAI use in academia (e.g. Akpan et al., 2025; Dwivedi et al., 2023; Gatrell et al., 2024) and are concerned that relying too much on AI applications can result in producing low-quality research (Burger et al., 2023; Tong and Zhang, 2023). While GenAI has been found to be an enabler of entrepreneurship (Davidsson and Sufyan, 2023), the evidence regarding the usefulness of open AI chatbots and other GenAI solutions (see, for instance, González-Esteban and Calvo, 2022) is contradictory: evaluations have ranged from very positive to very negative. Thus, it is important to assess how useful they are for studying business and management research phenomena and for conducting scholarly research in general (Benmamoun, 2025).
This study aims to critically assess the applicability and limitations of GenAI tools for academic research in IB, using the specific phenomenon of nonlinear internationalization as a case study. Nonlinear internationalization is characterized by fluctuations in the extent of firms’ international engagement – including partial or full foreign market exits and reentries. It is an important and complex part of IB literature, as it emphasizes that firms are not only moving toward increased international involvement, as was supposed by earlier research streams (Domínguez Romero et al., 2024). Moreover, nonlinear internationalization as a research topic has still not received enough research attention, although most internationalizing firms eventually face it (da Fonseca and da Rocha, 2023; Fernández-Alles et al., 2023; Vissak, 2024). This focus fits the aims of the study because the topic of nonlinear internationalization is still a relatively narrow research area, and thus there is relatively little variation in the key terms, making it easier to identify the key studies on the topic. Therefore, the scope of the focal phenomenon is both narrow and emerging enough for GenAI tools to provide non-trivial information about, potentially aiding researchers in seeking to contribute to the ongoing discussions on the topic.
The analysis focuses on evaluating three key dimensions: correctness (accuracy of information), hallucinations (instances of fabricated or inaccurate data) and thoroughness (depth and comprehensiveness of responses). The main purpose of this study is not to generate new theoretical advancement on the topic of nonlinear internationalization itself (although, by giving an overview of its nature, it also contributes to the understanding of firms’ internationalization processes) but to use it as an exemplary context in which the capabilities of GenAI tools to contribute to conceptually ambiguous IB research areas can be tested upon. This empirical focus is appropriate because nonlinear internationalization, as an evolving IB topic, requires both interpretation and familiarity with specific literature, both of which can pose challenges for GenAI tools. In doing so, this study shows how these tools can support or impede IB scholars in studying this phenomenon. It contributes to the emerging literature on the role of GenAI in business and management scholarship by shedding light on the applicability and usefulness (but also shortcomings – e.g. hallucinations, providing incorrect information) of the main types of GenAI tools for researchers today, including:
five main GenAI chatbots (ChatGPT Plus, Google Gemini, Microsoft Copilot, Claude and Perplexity);
three other GenAI chatbots (Character.AI, ChatGPT Online and ColossalChat); and
four AI-driven applications specifically designed for academic research (Avidnote, Elicit, Paper Digest and Scipace).
As the paper focuses on the usefulness of GenAI tools for studying nonlinear internationalization, then, initially, it is necessary to explain the essence of the studied phenomenon. Thus, the paper begins with a short literature review about the nature of nonlinear internationalization and an overview of studies on the usefulness of GenAI tools. After a methodology section, the results are analyzed, assessing both the answers and the sources provided by the selected tools, as, according to several studies, they sometimes “hallucinate”: give factually incorrect answers and cite sources incorrectly or even “invent” sources (cf. Benmamoun, 2025; Khizar et al., 2025).
The main contribution of this study to the IB literature and scholarship is thus the critical evaluation of current GenAI tools’ capabilities and limitations in researching a complex yet important IB phenomenon – firms’ nonlinear internationalization – emphasizing both the potential and drawbacks of these technologies for IB scholars in areas where literature is limited, terminologies are specific and where a high degree of contextual accuracy is required. Nonlinear internationalization can be a strategic response to many of the major issues prevalent in IB strategy, such as for institutional uncertainty, market shocks and resource constraints. Thus, by evaluating how GenAI tools are able to accurately capture the phenomenon, the present study as a whole contributes to the understanding of how digital tools in general, and current GenAI tools in particular, can facilitate or constrain the discussion on important topics in IB research. In addition, the study contributes to the understanding of firms’ internationalization processes.
2. Literature review
2.1 Nonlinear internationalization
Nonlinear internationalization has been defined in several ways in IB literature (see Table 1) but it is evident that although definitions vary to some extent, according to most authors, such internationalization should encompass single or multiple episodes of de- and/or re-internationalization. Thus, these definitions are also explained in Table 1.
Table 1 shows that firms’ de-internationalization can be full/complete (all foreign activities cease) or partial: some foreign activities continue. Similarly, re-internationalization can be also full (all previous foreign activities are restored) or partial (only some are restored and/or new foreign activities are started instead of previous ones); moreover, it can follow a complete time-out period (no foreign activities) or a period of temporarily reduced foreign operations.
Firms can experience nonlinear internationalization because of several factors: for instance, changes in the local or foreign economic and institutional environment, ownership change, changed goals, strategies or expectations, changes in the firm’s capabilities or resources (for example, knowledge or financial resources), relying on sporadic orders, losing previous or finding new customers (for overviews, see, for instance, da Fonseca and da Rocha, 2023; Martins et al., 2022; Tang et al., 2021; Vissak, 2024). According to these studies, full or partial foreign market exits and reentries can be voluntary and planned and, in some cases, also forced (for example, occur because of economic sanctions).
Scholars differ on whether nonlinear internationalizers can be considered successful or not. According to Nummela et al. (2016, p. 53), “failure should be defined as the venture’s unexpected decreasing involvement in international activities,” and Gnizy and Shoham (2014, p. 279) mentioned that [de-internationalization] “experiences sometimes bear a stigma of failure.” On the other hand, Benito and Welch (1997), Martins et al. (2022), Mellahi (2003), Sousa et al. (2021) and Vissak et al. (2020) explained why de-internationalization does not always mean failure – for instance, because firms could exit some markets temporarily or permanently because of sudden unexpected changes in the business environment, or because exiting a market where a firm has very minor export activities could cost less than staying there; thus, exiting can also improve a firm’s profitability. Consequently, circumstances and the type of nonlinear internationalization (e.g. full vs partial, temporary vs permanent exit) could affect the evaluation of it: whether it demonstrates “success” or “failure” (Vissak, 2023).
Thus, it is evident that nonlinear internationalization can be defined differently, it can be caused by various internal and external factors, and it can be intentional or forced. Moreover, the “verdict” of whether nonlinear internationalizers could be considered successful or not could depend on why and how firms experienced it. Considering its relative definitional ambiguity in extant literature, nonlinear internationalization is a suitable concept with which the accuracy of GenAI tools can be tested from a research perspective.
2.2 Usefulness of generative artificial intelligence tools in scholarly work
Evidence about the usefulness of GenAI tools in general is mixed, and thus it is important to bring out both their potential and drawbacks. Several authors have noted that GenAI tools such as chatbots could be useful for conducting scientific research. For instance, it has been suggested that such tools could be used to some extent for writing abstracts or even initial drafts of full papers (Anderson et al., 2023; Dwivedi et al., 2023; Lund et al., 2023) or generating new research ideas (Dowling and Lucey, 2023; Ivanov and Soliman, 2023). GenAI tools can also be used for developing relevant scientific research questions (Graf and Bernardi, 2023; Lahat et al., 2023; Tong and Zhang, 2023) and for explaining challenging topics (Ivanov and Soliman, 2023).
For literature search, the tools can be useful in finding and/or analyzing relevant research articles (Burger et al., 2023; Farrokhnia et al., 2024; Meloni et al., 2023) and summarizing them (Akpan et al., 2025; Zangrossi et al., 2024). When writing, they can help develop hypotheses and even survey questions (Benmamoun, 2025; Graf and Bernardi, 2023; Ivanov and Soliman, 2023) and are usable for generating synthetic data (Akpan et al., 2025) and identifying themes, trends and patterns in the data (Burger et al., 2023; Dahal, 2024; Graf and Bernardi, 2023). According to some scholars, GenAI applications can also provide good ideas regarding how to title articles (Day, 2023; Emenike and Emenike, 2023; Vaishya et al., 2023), while also helping authors to translate texts (Graf and Bernardi, 2023; Halloran et al., 2023; Ray, 2023), improve their language and develop a logical structure for their work (Ariyaratne et al., 2023; Elali and Rachid, 2023; Ivanov and Soliman, 2023).
On the other hand, several authors have criticized GenAI tools. For instance, according to some studies, they can provide fictitious sources for their text known as “hallucination,” provide incorrect publication data for existing studies or fail to recognize key literature on a given topic (Anderson et al., 2023; Dhane et al., 2024; Dwivedi et al., 2023).
In addition, these tools have other shortcomings. For example, they can sometimes plagiarize: rephrase text created by others without citing these sources (Farrokhnia et al., 2024; Lund et al., 2023; Salvagno et al., 2023). Because GenAI tools usually use open-access sources, their answers might also be based on articles published in “predatory” journals and other sources of questionable quality (see Burger et al., 2023) and provide misleading or biased information (Akpan et al., 2025; Khizar et al., 2025).
According to Tong and Zhang (2023), some GenAI tools are sometimes too repetitive; Dowling and Lucey (2023) noted that they might not be able to link multiple ideas, while Dwivedi et al. (2023), Lahat et al. (2023) and Salvagno et al. (2023) criticized them for proposing questions or ideas that are not original enough. In addition, these tools can misinterpret the content of scientific articles (Zangrossi et al., 2024) and create fake images that some authors could include in their scientific articles (Dash et al., 2024). Finally, such tools can produce texts that are not (directly) related to the topic and/or are not factually accurate or complete (e.g. Anderson et al., 2023; Farrokhnia et al., 2024; Vaishya et al., 2023). Overall, the extant literature on the usability of GenAI tools for academic research suggests that, while such tools can be used to some extent for doing scientific research, they also still have notable shortcomings, and not only because of ethical considerations (Akpan et al., 2025; Benmamoun, 2025; Khizar et al., 2025).
3. Method
To evaluate the applicability of current GenAI tools for academic research, this study adopts an exploratory, evaluation-based design. Specifically, we use the phenomenon of nonlinear internationalization as a testing topic for evaluating GenAI performance in a specific area of IB where we ourselves have domain-specific knowledge of the literature and thus are able to ascertain the accuracy of the data (GenAI output). This design allows us to examine how well the current GenAI tools can understand, define, and reference answers for a phenomenon that lacks a single consensus definition in literature and one that requires interpretation from scholars. The main goal is not to empirically study the phenomenon itself, but instead to evaluate the capabilities and limitations of GenAI tools in handling research that is typical in IB scholarship.
For the analysis examining the applicability of current GenAI tools for IB research, both open AI chatbots and the main GenAI tools in use today (ChatGPT Plus, Google Gemini, Microsoft Copilot, Claude, and Perplexity) were used. In addition, considering the focus of the study, four commonly used AI-enabled research tools (Avidnote, Elicit, Scispace and PaperDigest) were also included. Because there are several open chatbots available, the three were chosen through Google search for free chatbots, free AI text tools and free ChatGPT alternatives. ChatGPT Online (https://storysaverhd.io/chatgpt-online/), Character.AI (https://beta.character.ai/profile) and ColossalChat (https://chat.colossalai.org/) were chosen as the first was created by OpenAI and had similar goals as ChatGPT Plus, the latter differed as it was created by others, while the second one, in turn, was created by OpenAI but differed from all other tools as its purpose was different. Patton (2015) also suggested using purposeful sampling for selecting similar and contrasting cases. The selected AI tools were first asked to describe themselves (see Table 2) and thereafter to answer four main questions about nonlinear internationalization.
Questions were developed based on the studies on nonlinear internationalization cited in the literature review. The first three questions focused on definitions of nonlinear internationalization, de-internationalization and re-internationalization as the latter are important components of nonlinear internationalization, while thereafter, all the GenAI tools were asked if nonlinear internationalizers can be considered successful or not. If necessary, two additional questions were asked: “Please provide sources for […] [the text given by the tool]” or “Please provide a full record for […] [the source(s) cited by the tool].” Initially, it was also planned to ask questions about the causes of these phenomena, but as some tools already mentioned causes in their answers to the other questions, this was not necessary. All questions were asked in March–June 2024 except in the case of ColossalChat, as it was taken down in spring 2023.
Each time, the whole answer was copied to a table without making any changes besides using a different format (Times New Roman 10 pt, changing apostrophes (‘instead of ’) and removing empty rows, when necessary), adding quotation marks (“”) to indicate that the text originated from an AI tool and making some slight changes in the format of the list of provided sources (e.g. using bullet points instead of numbers). Thereafter, the quality of each answer was analyzed. Correctness was assessed by comparing AI-generated outputs with established literature and checking for factual accuracy. Hallucinations were evaluated by the incidence of instances where tools cited non-existent or fabricated sources. Finally, thoroughness was evaluated by assessing the completeness and depth of responses in addressing the questions. All tools were treated as units of analysis. Their outputs were examined systematically across the three performance dimensions to evaluate their potential to support or mislead academic research in complex IB phenomena. The following sections show that while several parts of the text generated by these AI tools were at least to some extent correct, citing was very problematic for them.
4. Findings
4.1 Definitions of nonlinear internationalization
Appendix 1 includes the full answers from the tools to the following question: “Please define “firms’ nonlinear internationalization” based on scientific sources.”
From the five main GenAI chatbots, ChatGPT Plus, Google Gemini and Perplexity gave relatively comprehensive answers about the nature of nonlinear internationalization: all mentioned following a nonlinear path, experiencing exits (the first two tools also mentioned reentries), and also several dimensions where nonlinearities can manifest (operation modes, customers, sales channels, products and markets). Microsoft Copilot was less detailed, and although it mentioned the same five dimensions, it did not explain if any exits or reentries should occur. Claude was also somewhat brief, and it did not mention these dimensions. These tools also mentioned some aspects they were not asked to focus on: ChatGPT Plus explained what happened in Estonia during the COVID-19 pandemic, Google Gemini stated that exits and reentries can occur because of internal and external factors, and Perplexity mentioned some managerial causes. Claude stated that it does not have access to the sources it cited, while Microsoft Copilot emphasized that this issue is currently being discussed and studied actively; thus, scholars might interpret the concept differently.
Regarding sources of their information, some tools were useful to some extent, yet they also came with issues: ChatGPT Plus cited three studies that all exist; moreover, the resulting text seems to originate from the cited sources. On the other hand, publication information for these sources was not fully complete – for instance, the author was not mentioned in the case of the first source and the editors in the other two – but, in principle, these sources can be found. Google Gemini cited five sources but provided partially incorrect (a wrong publication year for the first, a wrong link for the second) and somewhat incomplete information only about two. Most of the found information could originate from these two sources, but some could be compiled based on others not revealed by this tool. Microsoft Copilot provided information about three sources, but while citing the first source, it did not reveal that this information came from another one, as the authors had quoted another study for the definition. Moreover, Copilot did not use quotation marks despite copy-pasting relatively long parts of text (16 and 27 consecutive words in total, respectively). It also did not provide full publication information (editor and publishing house) for the third source. Perplexity made a mistake in the page numbers of the first source, “forgot” to mention two coauthors and the correct publication year of the third source; the information about the remaining two sources was also partially incorrect. Claude provided correct information about all three sources it mentioned, but most of its information seems to originate from the last one. Moreover, the latter two tools did not indicate which parts of their text originated from which source.
From the three open GenAI chatbots, none provided a clear enough answer to the question: Character.AI only focused on skipping close-by countries but did not mention exits or reentries; Colossal Chat did not explain the nature of nonlinearities at all (as achieving a lower level of internationalization compared to initial expectations is not necessarily nonlinear internationalization), while ChatGPT Online mentioned unpredictable changes and sudden divestments but ignored changes in firms’ export operations or other foreign operation modes. On the other hand, ChatGPT Online and Colossal Chat listed several internal and external factors that could cause or influence nonlinear internationalization, although they were not asked to do that. These chatbots were also unclear regarding the sources of their information: ChatGPT Online refused to provide sources, while the other chatbots listed only fictitious sources: although these journals exist, they have never published such articles; moreover, books with such titles have not been published (yet), either.
From the four AI-driven applications specifically designed for academic research, Avidnote and PaperDigest only focused on skipping pre-determined stages; they did not clearly mention exits or reentries. Elicit provided a relatively vague answer: it mentioned that something untraditional should happen but again ignored de- and re-internationalization. Only Scispace gave a comprehensive enough answer. Some of these tools also had problems with revealing the sources of their information: Avidnote refused to provide article titles, journal names, etc. for the sources it cited in its answer, although it was asked to do that. Its answer could partially originate from these sources, but probably not from the first source as that one focused on innovation. Elicit did not mention the coauthors of three studies in its answer; however, its list of sources was almost correct (except for missing page numbers of the last source). The answer could be based on these sources. PaperDigest gave correct publication information about the mentioned source; however, that article did not focus on all these aspects; thus, most probably, this tool “forgot” to mention some other source(s). Scispace did not provide source information for every sentence; moreover, it made mistakes in providing information about some sources (e.g. regarding which were authors’ first and last names) and provided incomplete publication information (e.g. journal volume number, issue number and page numbers were missing) for all sources.
Overall, these findings indicate that the tools should be used with extreme caution: in addition to not getting comprehensive and thorough enough answers about the nature of nonlinear internationalization, the potential users might get a list of fictitious sources because of “hallucinations”; moreover, some tools might cite some sources incorrectly; for instance, copy-paste long sentences without using quotation marks or provide incomplete publication data.
4.2 Definitions of de-internationalization
Appendix 2 contains the answers of the studied AI tools to the following question: “Please define “firms’ de-internationalization” based on scientific sources.”
From the main GenAI tools, ChatGPT Plus emphasized several important aspects; however, it mostly focused on divestments without clearly mentioning other operation modes. Google Gemini, Microsoft Copilot and Perplexity gave detailed answers that were, in principle, correct; however, they also encompassed issues they were not asked to discuss (reasons for de-internationalization, future research suggestions, and the importance of the research topic, respectively). Claude was briefer and it did not clearly mention export reductions.
All five tools provided information about sources, but its quality varied considerably. ChatGPT Plus provided correct publication information about the only source it cited, but some of its material most probably originated from elsewhere. Google Gemini provided a list of six sources, but all were fictitious: a combination of existing authors’ last names and existing journals, but random article titles. Microsoft Copilot provided a list of four sources but only cited three; moreover, it did not provide key information such as author names or titles about the first source. Perplexity mentioned five sources without indicating which parts of the text originated from which one; from these, the provided information was correct for only three: the other two were not published in these outlets. Finally, Claude cited only one source, but correctly enough.
From the three open GenAI chatbots, Character.AI focused on divestments without clearly mentioning other operation modes. ChatGPT Online mentioned both divestments and export or import reductions, while ColossalChat did not specifically mention any operation modes. These definitions were not wrong, but, at the same time, they were not comprehensive enough. All chatbots also listed several internal and external causes of de-internationalization, although they were not asked to. These causes can be considered correct enough. Character.AI’s and ColossalChat’s sources were all fictitious. ChatGPT Online cited two sources, but although their publication information was correct, those articles focus on other topics; thus, it is unclear which sources it actually used.
From the four AI-driven applications specifically designed for academic research, Avidnote’s answer was brief but correct. Elicit and PaperDigest did not mention any specific foreign operations. Scispace was somewhat unclear regarding whether de-internationalization must always be complete. All these tools provided a list of sources, but again, some information was problematic. Avidnote’s slightly changed the first article’s title and provided wrong page numbers; moreover, it “invented” the remaining three sources. Elicit mentioned three sources in the text but only provided publication information about two. In one case, it only gave some of the necessary information, while in the other, publication information was correct in the list of sources, but in the text, the publication year was wrong, and coauthors were not mentioned. PaperDigest created a list of ten sources, but only cited seven without indicating which part of the text originated from which source. Moreover, from these sources, Nos. 8 and 10 focused on other topics. Scispace provided five sources, but the information was incorrect for the first source (wrong order of authors), partially missing in the second (no authors), and somewhat incorrect in others (regarding which were these authors’ first or last names).
Thus, again the results show that although the studied AI tools were able to provide some useful information about the nature of de-internationalization, they had several problems with revealing the source(s) of such information. While some “cited” fictitious sources, some others did not disclose all sources or provided only partial information about them.
4.3 Definitions of re-internationalization
Appendix 3 encompasses the answers of the studied tools to the following question: “Please define “firms’ re-internationalization” based on scientific sources.”
From the main GenAI chatbots, none mentioned that reentries can also occur after partial exits. ChatGPT Plus and Claude gave brief but relatively correct answers. Google Gemini gave a very detailed answer but mostly focused on causes of this phenomenon, which it had not been asked to provide. Microsoft Copilot did not focus much enough on answering the question, either. Perplexity also listed some causes and consequences, although it was not asked to.
All five tools provided information about sources, but its quality varied. ChatGPT Plus and Claude did not make mistakes this time: information about this source was correct, and its text could have originated from there; still, Claude warned that it does not have access to it. Google Gemini, on the other hand, only listed fictitious sources. Microsoft Copilot cited two existing sources but did not provide fully correct information about one of them (regarding the book’s title). In addition, it creatively “combined” the titles of two existing studies with wrong author names and publication titles. Perplexity listed four existing sources and provided correct information about them; however, the last source was fictitious, and it also did not indicate which sentences originated from which source.
From the three open GenAI chatbots, all three can be considered correct enough (but they could have also mentioned reentries after partial exits). On the other hand, again, they did not only explain the nature of re-internationalization but also mentioned its causes without being asked to do that. Both Character.AI and ColossalChat only listed fictitious sources. ChatGPT Online refused to provide any sources for the text it had created. Thus, it is unclear which sources these chatbots used to answer this question.
From the four AI-driven applications specifically designed for academic research, Avidnote’s answer was short but correct. Elicit, PaperDigest and Scispace were more detailed, but they mostly focused on impact factors instead of explaining the nature of re-internationalization. Moreover, they did not mention that re-internationalization could occur after partial exits.
Avidnote listed four sources. For the first and fourth ones, publication information was correct, but most probably, the text did not originate from there. The second and third sources were “combinations” of existing and fictitious elements. Elicit only provided a list of existing sources, but it did not mention two studies’ coauthors in the main text; moreover, it did not provide publication information for one source. PaperDigest gave links to five studies that seem to be the genuine sources of the provided information. Scispace, on the other hand, mentioned a wrong author in the case of the first source, did not mention any authors in the case of the second and fifth, and got “confused” regarding which were the authors’ first and last names in the case of the third and fourth sources.
Thus, similarly to the answers to two previous questions, AI tools showed limited usefulness: some understanding about the nature of re-internationalization can be gained, but most tools failed to reveal all important aspects. Moreover, again, most of them “hallucinated”: failed to cite their sources correctly and/or mentioned fictitious sources.
4.4 Assessments of whether nonlinear internationalizers are “successful”
Appendix 4 contains the answers of GenAI tools to the following question: “Please explain based on scientific sources if nonlinear internationalizers (firms that have de- and re-internationalized several times) can be considered successful or not.”
From the main GenAI chatbots, ChatGPT Plus was not very detailed, but the answer can be considered correct. Google Gemini and Claude provided thorough answers listing several relevant arguments for and against success. Microsoft Copilot was somewhat vague but not wrong, either. Perplexity was vague, and it mentioned several aspects (e.g. cause, importance for research) not directly related to the question.
ChatGPT Plus provided somewhat incomplete (missing volume, issue and page numbers) publication information of one study that could have been the source of information (except for, probably, the last sentence). Google Gemini listed five publications, but all were fictitious. Microsoft Copilot listed two existing articles (for the second one, the issue number was wrong) and one that exists but not in this form (an article with this title was published in that year, but information about the authors and the journal was wrong). It also failed to use quotation marks despite copy-pasting 10 consecutive words from the first study. Perplexity listed six sources but made two mistakes: the volume number was wrong for the third source and one co-author was missing for the last one. Claude mentioned four studies but provided a wrong publication year and did not mention two coauthors of the first study, gave slightly wrong page numbers for the third one, and did not mention volume and issue numbers for the last one. Also, the latter two tools did not indicate which sentences originated from which source(s).
Of the three open GenAI chatbots, all were to some extent correct. ChatGPT Online was relatively brief, but it managed to mention some success measures and emphasize that firms are heterogenous. Character.AI listed both negative – costs, uncertainty and risks – and positive (new opportunities, increased demand) aspects, but also some contextual factors. ColossalChat also mentioned some important aspects – agility versus high costs – regarding if such firms could be considered successful or not.
ChatGPT Online refused to reveal which sources it used; it only stated that its answer was based on “reputable sources.” Both Character.AI and ColossalChat mentioned three fictitious studies. The latter also listed two other sources – a journal (without mentioning any specific articles published there) and the website of a product and service testing service – but it is unclear if any parts of the answer originated from there.
From the four AI-driven applications specifically designed for academic research, Avidnote’s answer was detailed, and it listed several important aspects. Elicit was relatively brief but correct. PaperDigest and Scispace were somewhat vague – especially regarding how to evaluate if firms were successful or not – but not wrong, either.
Avidnote refused to provide a list of sources, although it mentioned four in the text. These four were most probably not the real sources for its information, as the most likely “candidates” (based on a search from Google Scholar) mostly focused on other issues. Elicit listed four sources and provided correct publication information about them, but it only cited three in the text; it also “forgot” to mention coauthors’ names in the text despite having this information in the list of sources. PaperDigest provided links to ten sources that all focus on various internationalization-related aspects but only cited four. Scispace, on the other hand, listed four sources but cited five in the text, and the order of provided sources did not fully coincide with the order of sentences in the text. For these four sources, publication information was almost fully correct, except for adding two unnecessary initials to the first author’s name in the case of the fourth study.
Consequently, likewise to the answers to three previous questions, these AI tools identified some useful aspects. On the other hand, again, most of them failed to provide correct publication information for the sources they cited in the text: some sources were listed but not cited and vice versa; moreover, some information was incorrect, and some studies were fictitious. Thus, again, “hallucinations” occurred.
5. Discussion
This study was designed to assess the contextual reliability of GenAI tools in academic research by evaluating their ability to engage with a specific, definitionally complex phenomenon typical for IB research. The intent was not to draw new empirical conclusions about nonlinear internationalization but to evaluate how these tools perform in a research setting that demands accuracy, source transparency, and domain familiarity.
The findings suggest that, to some extent, GenAI tools have developed to a point where they manage to provide somewhat useful information when conducting IB research. Consequently, scholars might use these tools for “brainstorming” (Khizar et al., 2025) and, as a result, get a few ideas about which additional aspects they could study or which titles they could choose for their articles. On the other hand, all the examined tools also had some shortcomings: they often provided irrelevant sources and ignored some of the key foundational literature on the topic, and sometimes, while ignoring important studies in the field, cited sources that have not actually made such statements about the specific topic at all. At the same time, it was not clear which actual sources were used. This is plagiarism, as the authors of the actual sources did not get credit for their work. Moreover, these tools sometimes provided too short, vague and/or partially incorrect answers. Checking for all these problems took considerable time. Thus, overall, in their current form, the studied GenAI tools should be used with extreme caution for conducting scientific research.
The findings raise questions about the overall impact of GenAI adoption in IB research, e.g. how such tools can affect theory development in the field, as well as the quality (problems) of managerial and policy recommendations that are based on AI-generated work. Moreover, this challenge is further heightened in narrow or emerging research domains, such as nonlinear internationalization, where conceptual clarity is still developing and where the corpus of foundational literature is still relatively small in number. In such cases, the inability of GenAI tools to reliably interpret, to cite, or to build on existing knowledge can potentially distort the scholarly understanding of the topic.
In sum, the findings contribute to the literature on GenAI by illustrating the interplay between current GenAI performance and IB research complexity. This study emphasizes both the potential (e.g. sometimes providing correct and relatively comprehensive responses and some useful ideas) and challenges (e.g. hallucinations and lack of thoroughness) these technologies present for scholars. Using nonlinear internationalization as the focal phenomenon allows us to assess the usefulness of GenAI tools in addressing research topics that are common to the IB domain, and thus the present study contributes to the wider discourse on how AI can assist in advancing scholarship in emerging and complex discussions in IB research. Next, the theoretical and practical implications are considered in more detail.
5.1 Theoretical implications
This study contributes to the discourse on strategy and research methodology in the IB research domain by analyzing how GenAI tools perform when asked to work on a complex but strategically notable phenomenon in the field: nonlinear internationalization. The emergence of GenAI tools has brought about a paradigm shift in the realm of academic research, and this paper contributes to the theoretical understanding of the role of GenAI in academic research by highlighting five critical points that scholars must consider when integrating these tools into their research process and practices.
First, the results suggest that GenAI tools currently demonstrate the ability to identify some useful aspects of domain-specific phenomena such as nonlinear internationalization. These tools can provide preliminary insights that can aid researchers in generating new research ideas, developing important research questions and identifying themes, trends and patterns in existing literature (Dowling and Lucey, 2023; Ivanov and Soliman, 2023; Graf and Bernardi, 2023; Lahat et al., 2023; Tong and Zhang, 2023). The present study illustrates the ways in which, in IB research specifically, they are not infallible and often provide incomplete or partially incorrect information, necessitating domain-specific knowledge from researchers to ascertain the accuracy of their results.
Second, the citation quality provided by the AI tools is still often problematic: the tools frequently “hallucinate,” citing sources that are either fictitious or incorrectly attributed, which can lead to significant issues of academic integrity and reliability in the IB field. This problem is compounded by the fact that some sources listed in the references are not cited within the text and vice versa, leading to a disjointed and unreliable citation network (Anderson et al., 2023; Dwivedi et al., 2023; Lund et al., 2023; Salvagno et al., 2023). Such discrepancies require additional validation by IB scholars to ensure the accuracy of their citations. As such, the extra work can result in offsetting some of the promise of GenAI tools as enhancers of scholarly research in general.
Third, as the tools sometimes provide too vague or off-topic answers, they can mislead researchers and guide them down unproductive paths. For example, while the tools might offer a broad overview of a topic, they often fail to account for the specific details that are necessary for rigorous academic research (Ivanov and Soliman, 2023; Burger et al., 2023; Farrokhnia et al., 2024; Meloni et al., 2023; Vaishya et al., 2023). This limitation points toward using these tools as supplementary aids rather than primary sources of information.
Fourth, the tools have been found to plagiarize content, presenting it as original work without proper attribution. This not only raises ethical concerns but can also undermine the credibility of the research produced using these tools. Plagiarism detection and prevention thus become necessary for researchers using GenAI in their work (Zangrossi et al., 2024; Dash et al., 2024). Researchers must be alert in checking the originality of the content generated by these tools to avoid inadvertent academic misconduct and plagiarism. The findings of the present study indicate that IB research is not exempt from these issues in GenAI use.
Fifth, the study finds the tools sometimes produce texts that are not directly related to the IB topic at hand or are factually inaccurate. This can lead to the dissemination of misinformation and consequently adversely affect the scholarly discourse in the field (cf. Anderson et al., 2023; Ariyaratne et al., 2023; Elali and Rachid, 2023; Emenike and Emenike, 2023; Halloran et al., 2023; Ray, 2023).
In sum, the present study contributes to and advances the understanding of the usefulness of GenAI tools for IB scholars (and also business scholars more widely) as, unlike wider AI evaluations referred to in the literature review section, this empirical research highlights the specific challenges that emerging IB topics can present for GenAI tools – challenges such as their inability to address definitional ambiguity and their limited capabilities for robust use of literature. These findings contribute to the academic discourse on the role and challenges of GenAI tools in advancing research methodologies, particularly in under-researched IB domains where definitions are still forming or contested, where terminology can be context-dependent, and where theoretical frameworks are still being developed. The findings clearly indicate that such research streams require both methodological precision and domain-specific knowledge and judgment based thereof, which current GenAI tools do not yet seem reliable enough to replicate.
5.2 Practical implications
The practical implications of this study highlight the importance of critical validation and domain-specific expertise when using GenAI tools in IB research. Because of these tools’ tendency to give too vague or somewhat off-topic answers, all their answers should be checked very carefully, as otherwise, serious mistakes might occur.
Thus, for IB researchers, but also those active in other fields, the findings emphasize the importance of being cautious and critical, as well as the necessity of possessing deep domain-specific knowledge on the focal research topic when relying on GenAI tools in any notable way in research. Scholars should search for the literature themselves to get a better understanding of the studied phenomena and to find all relevant studies on the studied topic(s), always verify AI-generated outputs against established literature, and thus see the tools as supplementary research aids rather than primary resources to be trusted without question. They should also carefully check if these tools cited sources correctly, as they all made some mistakes (e.g. plagiarized, provided fictitious sources, cited sources not mentioned in the list of references or vice versa): even if some of them did not make mistakes in answering some questions, they were not flawless in answering others. Thus, some of the assumed productivity increases that GenAI tools have been argued to bring to scholarship are actually offset by the additional need to always re-check, cross-validate and carefully evaluate each and every GenAI tool output, regardless of whether or not the tool in question is open source, licensed or indeed developed for research purposes to begin with.
These above-mentioned risks are even greater in narrowly defined or under-researched areas such as nonlinear internationalization, where accuracy in terminology and citations is critical. The complexity of this phenomenon showcases in practice how IB phenomena can constitute distinct challenges for GenAI tools. This study demonstrated how GenAI tools often fail to provide comprehensive or accurate responses when they are used to make sense of emerging IB topics. These findings can in part also inform research on other similarly complex and underexplored areas in IB. Our findings therefore offer practical guidance not only for the use of GenAI tools in IB research in general but also for their potential application in emerging research domains in particular.
5.3 Limitations and future research
The dynamic nature of GenAI development means that the capabilities and limitations of these tools are continually evolving. As such, the current assessment of their utility may change over time. Future researchers should periodically reevaluate these tools to determine whether advancements have addressed existing shortcomings and how new features might enhance their usefulness (Rossi et al., 2024). This ongoing evaluation will help ensure that scholars are making informed decisions about the use of AI tools.
This study asked 12 GenAI tools to define and explain specific nonlinear internationalization-related phenomena. Future researchers might focus on other topics or ask different questions from these tools – for instance, they might ask them to summarize, rewrite or translate some text. They could also try to “consult” several other tools such as Scopus AI or Scite AI. Moreover, as AI tools keep developing, the “verdict” about their usefulness could become more positive in the future. Consequently, it would be interesting to find out how these tools answer the same questions several years later.
6. Conclusion
This study was conducted to find out if GenAI tools could be regarded as useful for conducting IB research, using the concepts related to firms’ nonlinear internationalization as its empirical focus. In sum, the evaluation of GenAI tools revealed distinct strengths and limitations across the three dimensions. First, for correctness, while several tools provided partially accurate insights, significant discrepancies were observed in data consistency and alignment with established IB literature. Second, for hallucinations, several of the tools (especially, the open chatbots) frequently cited nonexistent or fabricated sources, raising concerns about their reliability for scholarly use. Third, for thoroughness, tools varied in their ability to provide comprehensive explanations: only a few offered detailed responses that adequately addressed the complex phenomenon of nonlinear internationalization.
Thus, the conclusion is that the studied 12 GenAI tools still have relatively limited usefulness for the purposes of IB scholars: domain-specific knowledge is needed to ascertain the quality of provided material. Moreover, the information about used sources should be checked carefully because of these tools’ tendency to “invent” fictitious sources and incorrectly cite existing ones. Thus, while GenAI tools offer promising capabilities for academic research, their limitations necessitate careful and critical use. By understanding and addressing the concerns, IB researchers can better leverage these tools to enhance their work while maintaining the rigor and integrity of their research. As nonlinear internationalization encompasses some of the complexities and ambiguities prevalent in IB scholarship in general, this study is also joining the broader conversations on how scholars might responsibly – or irresponsibly – consider integrating GenAI tools into the study of IB and management.
Note
solutions that can respond to conversations by creating texts that resemble genuine human language (for their technical characteristics, see Lund et al., 2023; Meloni et al., 2023; Ray, 2023; Slavin, 2023; Zangrossi et al., 2024).
The work by Tiia Vissak was supported by the Estonian Research Council’s grant PRG 1418 “Export(ers’) Performance in VUCA and Non-VUCA Environments”.
The author Lasse Torkkeli is also affiliated with Turku School of Economics.
