The unprecedented growth in the volume, variety and velocity with which data is generated and collected over the last decade has led to the spread of big data phenomenon. Organizations have become increasingly involved in the collection and analysis of big data to improve their performance. Whereas the focus thus far has mainly been on big data collected from customers, the topic of how to collect data also from those who are not yet customers has been overlooked. A growing means of interacting with non-customers is through crowd-based phenomena, which are therefore examined in this study as a way to further collect big data. Therefore, this study aims to demonstrate the importance of jointly considering these phenomena under the proposed framework.
This study seeks to demonstrate that organizations can collect big data from a crowd of customers and non-customers through crowd-based phenomena such as crowdsourcing, citizen science and crowdfunding. The conceptual analysis conducted in this study produced an integrated framework through which companies can improve their performance.
Grounded in the resource-based view, this paper argues that non-customers can constitute a valuable resource insofar as they can be an additional source of big data when participating in crowd-based phenomena. Companies can, in this way, further improve their performance.
This study advances scientific knowledge of big data and crowd-based phenomena by providing an overview of how they can be jointly applied to further benefit organizations. Moreover, the framework posited in this study is an endeavour to stimulate further analyses of these topics and provide initial suggestions on how organizations can jointly leverage crowd-based phenomena and big data.
1. Introduction
The term big data has been attracting increasing managerial and academic attention due to the many benefits it can bring to organizations (Ardito et al., 2018; Cappa, Franco and Rosso, 2022a; Elia et al., 2019; Jin et al., 2015; Sestino et al., 2020; Del Vecchio, Di Minin, et al., 2018a; Visconti and Morea, 2019). Although the term was born decades ago with a negative connotation in a National Aeronautics and Space Administration (NASA) project to identify the problem of storing too much information in a limited amount of hard disk drive space, the concept is now mainly considered in positive terms as a potential source of competitive advantage (Cappa et al., 2021; Johnson et al., 2017; Marshall et al., 2015). The proliferation of new technologies and the digitalization of the businesses environment have contributed considerably to the generation of enormous amounts of customer data available in nearly real time (Ardito et al., 2018; Bharadwaj and Noble, 2017; Johnson et al., 2017; Sestino et al., 2020; Trabucchi et al., 2018). In fact, websites and mobile devices give organizations access to data produced and shared by a vast population. The unprecedented growth in the volume, variety and velocity of data generated and transferred on a daily basis has increasingly led organizations to consider the ways big data can benefit their performance (Ardito et al., 2018; Cappa et al., 2021; Elia et al., 2019; Del Vecchio et al., 2018a, 2018b). Compared to traditional data, big data is characterized by high values of volume, velocity, variety, veracity and value (Cappa et al., 2021; Jin et al., 2015; Tian, 2017). Volume indicates the quantity of data available, velocity is the speed at which data is collected and managed, variety is the number of sources the information originates from, veracity is the trustworthiness of the information and value is the potential to generate benefits for firms. As is clear in the above definitions, these five Vs have been favoured through advancements in information technologies (IT) that allow organizations to interact with a wider audience more easily, quickly and effectively.
Recent studies have started exploring the positive impact that the use of big data can have on organizations. Corte-Real et al. (2017) have highlighted, through a survey of managers, that the availability of big data can benefit a firm’s financial performance. Müller et al. (2018) have shown that using big data affects productivity positively. In their study, Tiwari et al. (2018) found that the analysis of big data in the health-care sector can reduce costs and improve quality. Other recent studies (Johnson et al., 2017; Tan and Zhan, 2017) have, moreover, provided evidence of the benefits brought about by big data in terms of new product development. In addition, big data can provide information on customer preferences and emerging trends (Dotsika and Watkins, 2017; George et al., 2014; Mazzei and Noble, 2017; Del Vecchio et al., 2018b). More broadly, thanks to the insights collected, big data can help companies create new knowledge, make better business decisions, anticipate new trends and deliver better products and services (Batistič and van der Laken, 2019; Cappa et al., 2021; Khan and Vorley, 2017; Marshall et al., 2015). As companies are increasingly looking to create, acquire, capture and share new knowledge, big data is becoming crucial to achieving these aims (Chierici et al., 2019; Khan and Vorley, 2017; Pauleen and Wang, 2017; Sumbal et al., 2017). As a result, policymakers are also aware of the potential value of big data, and recently several governments, including the USA and China, have granted subsidies to encourage the use of big data by public and private companies (Jeans, 2021; Weiss, 2012; Wu et al., 2014).
Considering the above, it is evident that there is a growing interest in big data. It is possible to collect big data from many sources, ranging from other organizations such as suppliers to internal production and managerial processes, and individuals external to organizational boundaries. With respect to big data from individuals, so far researchers and managers have mainly focussed on the big data organizations can collect from customers (Cappa et al., 2021), overlooking the ways it is possible to also collect big data from non-customers. To this end, this study posits the idea that an effective means to do so can be crowd-based phenomena – that is, the involvement of individuals from outside an organization’s boundaries in the production and sharing of new knowledge thanks to data, ideas or collection of funds (Cammarano et al., 2017a, , 2017b; Cappa et al., 2019; Franzoni and Sauermann, 2014; Sun et al., 2020). Indeed, crowd-based phenomena arise from open innovation (OI), which postulates the need for knowledge inflows and outflows among several entities in order to advance knowledge creation and diffusion (Chesbrough, 2003; Ebersberger et al., 2021; Franco et al., 2021). Thanks to the increased digitalization of the business environment and increased digital literacy of society as a whole (Cappa et al., 2022a), many dispersed individuals can be easily involved in collecting ideas, data and funds every day. These technology-mediated interactions are a unique opportunity for companies to also access non-customers, making them a potentially valuable resource. Studies on crowd-based phenomena are mainly focussed on analysing how to maximize the effectiveness of their main aim, i.e. the collection of ideas, data, and funds, without adequately exploring the potential additional positive effects arising from the collection of big data from dispersed individuals who include both customers and non-customers. Therefore, this study explores the following research question: Can companies leverage crowd-based phenomena to collect big data from customers and non-customers?
Under the resource-based view (Barney, 1991), it is contended that big data collected through crowd-based phenomena from non-customers as well can constitute a distinctive, valuable and non-imitable resource. In particular, such data can provide additional worthy insights that will further improve firm performance. The framework proposed in this study shows how organizations can collect big data from the crowd, which also includes non-customers, while they participate in a crowd-based activity that contributes to the creation of knowledge (Cappa et al., 2019; Franzoni and Sauermann, 2014). This will allow organizations to enrich their understanding of the environment in which they operate and take better business decisions that ultimately positively impact performance. The framework posited in this study aims to advance scientific understanding of big data and crowd-based phenomena by underscoring the possible benefits of their simultaneous implementation. Moreover, the proposed framework is also of managerial and policymaking interest because it indicates how it is possible to jointly leverage big data and crowd-based phenomena to further benefit organizations. More broadly, this study seeks to demonstrate the importance of jointly considering these phenomena under the proposed framework and nurturing further interest in this direction.
The manuscript is organized as follows: in Sections 2 and 3 big data and crowd-based phenomena respectively are described; Section 4 explains how it is possible to implement them jointly and what the positive effects are for organizations; finally, Section 5 reports the implications and conclusions of this study.
2. Big data: the new oil for organizations
Big data is a crucial resource for public and private organizations seeking to gather information about customer preferences, feedback on products and service provided and insights into emerging new trends (Ardito et al., 2018, 2019; George et al., 2014; Mazzei and Noble, 2017; Pauleen and Wang, 2017; Urbinati et al., 2018; Del Vecchio et al., 2018b). Depending on how big data is owned and managed, we can refer to public big data, private big data and open big data. Public big data is data owned and used for research purposes by public entities, open big data offers accessibility to everyone interested, and private big data is created and owned by private organizations to gain a competitive advantage (George et al., 2014). Among the various sources from which all these kinds of big data can be collected, technology-mediated interactions with individuals through mobile applications and Web-based platforms are the most common sources (Cappa et al., 2021; Trabucchi et al., 2017; Yaqoob et al., 2016). .
Organizations can benefit from using big data in several ways: optimizing product portfolios to better address current and emerging customer needs; applying dynamic pricing, based on changing conditions in their surroundings; taking data-based business decisions; increasing operational efficiency by improving services and support provided; improving retention rates; and facilitating product and service innovation (Cappa et al., 2021; George et al., 2014; Mazzei and Noble, 2017; Del Vecchio et al., 2018b). Among recent examples of such benefits arising from big data, General Electric increased the efficiency of its gas systems (Wamba et al., 2017), Alibaba developed its own credit score system for non-performing loans (Nonninger, 2018), Ford improved its car design process (Erevelles et al., 2016), Amazon developed predictions about future buying trends, IBM developed a predictive model using the Watson platform (Khoury and Ioannidis, 2014), and various governments have improved the organization of traffic and the management of health institutions (Rogge et al., 2017). As a result, big data is now being widely considered the “new oil” for organizations (Oyer, 2019) and many governments are launching funding schemes to further leverage the value of big data (Jeans, 2021; Weiss, 2012; Wu et al., 2014).
Organizations, scholars and policymakers have so far mainly considered big data from individuals to be information coming from customers. In contrast, this research contends that companies should also examine big data from non-customers because they may well constitute a valuable resource, especially considering this has previously been overlooked. This information may allow firms to further create and capture value, i.e. allow them to gather valuable insights and secure returns from them (Lepak et al., 2007; Urbinati et al., 2018). In turn, this may allow them to collect additional insights and outpace their competitors.
From this perspective, it is argued that organizations should also collect data from non-customers in order to make better business decisions and improve performance. As the advancement of IT has made it possible to overcome social, physical and geographical barriers, companies are increasingly including crowds in their activities, and it might thus be feasible for many organizations to leverage crowd-based phenomena to also collect big data from non-customers.
3. Crowd-based phenomena: tapping into crowd wisdom
In the 1980s, there was the idea of a clearly defined, linear process of R&D that began with science and research and ended with marketable products and services, entirely conducted inside company boundaries. Instead, research findings made it increasingly clear that R&D is a complex social process in which the interactions between multiple parties play a central role. In particular, the concept of OI, which was developed in the 2000s, has underscored how crucial it is for organizations to have porous boundaries to innovate and succeed (Bagherzadeh et al., 2020; Bogers et al., 2019; Cammarano et al., 2017a, 2017b; Walsh et al., 2016). Moving from the “closed” approach to innovation to an “open” one means facilitating knowledge inflows and outflows with many external entities involved, rather than having each organization working just on its own; this can produce enhanced innovative outcomes (Cappa et al., 2019; Enkel et al., 2009; Franco et al., 2021). Since the OI concept was established, it has become increasingly clear that individuals external to an organization’s boundaries are crucial to the production and sharing of knowledge (Cappa et al., 2019; Cappa et al., 2022a; Franzoni and Sauermann, 2014). Among the various possible partners that can be involved in OI, such as suppliers, competitors, universities, the involvement of a large number of dispersed individuals is becoming more common thanks to advancements in IT and the digitalization of the general public. In this way private and public organizations now have the opportunity to exploit crowd wisdom, i.e. the wide variety of expertise and resources people are endowed with (Bayus, 2013; Mollick and Nanda, 2016). Therefore, individuals have become more and more successfully involved in providing ideas for innovation, data for scientific projects and funds for promising new entrepreneurial ventures, which are referred to, respectively, as crowdsourcing, citizen science and crowdfunding.
Although it requires great effort to organize and manage, the inclusion of crowds brings benefits to all parties involved. For organizations, it is possible to collect ideas, data or funds in a cheaper and quicker manner (Cappa et al., 2019; Cappa et al., 2022b; Franzoni and Sauermann, 2014). Moreover, thanks to their participation in scientific projects, individuals can enhance their literacy and can have a pleasant and unconventional experience (Cappa et al., 2020; Paul et al., 2014). In addition to the above, crowd-based phenomena have been recently shown to be beneficial for society as a whole because citizen involvement in scientific aims can also be extremely effective in tackling grand challenges, i.e. the pressing environmental and societal issues that society is facing, such as global warming or environmental pollution (George et al., 2016; Schubert, 2017).
In addition to the above-mentioned benefits available through crowd-based phenomena, this study argues that a further positive effect is that such activities may also allow companies to collect considerable data about the individuals involved, whether customers or non-customers of the organization. Collecting this kind of big data can help organizations better define prospective customers, understand emerging trends, and thus target their actions.
4. Big data and crowd-based phenomena in gathering insights from customers and non-customers
The following subsections outline the way in which big data can be collected from non-customers, in addition to current customers, through each of the crowd-based phenomena identified in this study, i.e. crowdsourcing, citizen science, and crowdfunding, to the benefit of organizational performance. At the end of each of the three subsections a proposition is posited.
4.1 Big data and crowdsourcing
In 2006, Jeff Howe (Howe, 2006) first coined the term crowdsourcing as a portmanteau word of “crowd” and “outsourcing”; it has continuously gained academic and practitioner attention since then and it is now used to identify the inclusion of crowds of individuals in the collection of innovative ideas for problem solving and the creation of new knowledge (Afuah and Tucci, 2012; Cairo et al., 2015; Cappa et al., 2019; Garcia Martinez and Walton, 2014; Natalicchio et al., 2017; Vermicelli et al., 2021). Crowdsourcing has been used for various purposes, from the creation of advertising content for marketing campaigns – for example, Doritos asking the crowd for help in designing their new video commercial (Brabham et al., 2014) – to the resolution of computer bugs and business problems – such as the prizes offered by Google and Microsoft to resolve flaws and errors in Android and Azure (Gibbs, 2015; Rundle, 2019) – to the development of new products and services – as occurred with the Fiat Mio project or with IBM’s Innovation Jam (Saldanha et al., 2014). In crowdsourcing there is a close link between the number of contributions and the quality of extreme outcomes, i.e. exceptionally valuable contributions that companies will actually use (Boudreau et al., 2011; Cappa et al., 2019). As a result, great effort is dedicated to maximizing the number of contributions collected in order to identify exceptional ideas; this translates into reaching the highest number of individuals possible, which offers a great opportunity to collect big data.
This study contends that while they are collecting contributions for their project’s principal aim, companies can also collect big data from a large number of individuals during their technology-mediated interaction. For example, Waze is a crowdsourcing service for reporting accidents and traffic jams via mobile phone applications installed by drivers (“Waze”, 2020), but it can also collect a large amount of other information from users. Beyond tracking the current position of drivers, which is useful for crowdsourcing the best route to their destinations, the application also collects demographics and historical habits to better profile drivers and their behaviour. The big data collected in this way has been shown to constitute an extremely valuable resource that can also be exchanged by Waze with other organizations such as local governments (Olson, 2014). Waze services are also accessible in a reduced form by non-registered users (Winona Santosa, 2017). In this way, Waze can collect information not only from regular customers who have downloaded the mobile application but also from one-time users or individuals who would like to see how the system works by connecting occasionally though the web-based platform. Similarly, companies can exploit crowdsourcing in a structured way as a mean to collect data from a multitude of customers and non-customers. Such data can contribute to the creation of richer big data for organizations and better knowledge creation. Based on the above, the first proposition posited is the following:
During crowdsourcing campaigns, organizations can collect big data from (customers and) non-customers from which they can extract valuable outcomes.
4.2 Big data and citizen science
The way in which scientific research is conducted has also changed due to the inclusion of crowds in data collection and analysis, leading to the birth of citizen science (Franzoni and Sauermann, 2014; Wildschut, 2017). Citizen science is another form of crowd-based phenomena whose roots date back to the early 19th century when people were recruited to gather data used, for example, to catalogue birds (Land-Zandstra et al., 2015). However, it has only been in the past decade that citizen science has been expanding more and more thanks to the ease with which individuals can be reached using IT tools, and this has attracted increasing scientific and policymaking attention (Cappa et al., 2022a; Land-Zandstra et al., 2015; Sauermann et al., 2020; Wildschut, 2017; Spasiano et al., 2021). Citizen science seeks to involve citizens, without any specific scientific background requirement, in the collection and analysis of data for research projects through technology-mediated interactions. In fact, organizations are looking for ways to collect and analyse large amounts of data, and citizen science is an effective framework for achieving this goal (Garcia Martinez and Walton, 2014; Lukyanenko et al., 2020). Individuals from all over the globe can participate in citizen science projects by devoting their time, in tasks ranging from data collection to data analysis (Sauermann et al., 2020). In the former case, citizen scientists are sensors, with individuals providing data that can later be used by professional scientists, whereas in the latter case they actively contribute to the analysis of data (Cappa et al., 2020). Recent successful examples of citizen science projects were “ebird”, “Open Air Laboratories”, “Forest Watchers”, and “Brooklyn Atlantis”, where the crowd was involved, respectively, in categorizing bird species, monitoring air pollution, preventing deforestation and minimizing pollution in bodies of water.
As occurs with crowdsourcing, in citizen science, the more data is collected from the crowd for the project’s scientific aim, the better the outcomes are (Cappa et al., 2022a). The audience involved in citizen science comprises both individuals who participate only once and people who often participate in the activities organized by the public or public organizations managing the citizen science projects, or who even belong to these organizations. Consequently, a large number of citizens are involved in such data collection and analysis, giving these organizations the opportunity to collect additional insights that constitute big data. For example, in a recent citizen science project that aimed to get citizens to help reduce electricity consumption in order to mitigate pollutant emissions and energy costs, i.e. B.E. Smart, additional information from recurrent and occasional participating individuals was also collected, beyond the initial data required for the main scientific aim, and this produced big data (Cappa et al., 2020). Therefore, in addition to the environmental aim of minimizing energy pollutant emissions, thanks to citizen contributions, the big data collected offered insights regarding the possible target audience that most likely could be involved in future similar projects.
Such insights make it possible to implement targeted actions so as to increase the efficacy of citizen science projects, which can be crucial for both researchers and policymakers who want to better attract individuals to other organizational activities. Therefore, the second proposition of the study is as follows:
During citizen science projects, organizations can collect big data from (customers and) non-customers from which they can extract valuable outcomes.
4.3 Big data and crowdfunding
The collection of contributions from dispersed individuals is also starting to be applied in the finance domain. Crowdfunding is indeed another form of crowd-based phenomena that seeks to leverage the crowd to collect funds as an alternative source of financing for new entrepreneurial ventures (Nielsen and Binder, 2021; Roma et al., 2017; Troise et al., 2021a, 2021b; Zhang and Chen, 2019). It allows businesses to collect small amounts of money from a number of dispersed individuals, rather than from traditional sources such as banks and venture capitalists, in return for some form of reward, property rights, payment or just for the sake of contributing to a project that backers like (Belleflamme et al., 2014; Cappa et al., 2022b; Mollick, 2014). There are different forms of crowdfunding depending on the type of incentive offered, i.e. donation-based, reward-based, lending-based and equity-based, ranging from the kind that most resembles a charity to the type of crowdfunding most like a traditional financial investment (Giudici et al., 2018). In fact, donation-based crowdfunding is typically used to raise money for a non-profit or a social cause (Belleflamme et al., 2014; Bruton et al., 2015; Presenza et al., 2019; Troise and Tani, 2020) and backers contribute because they like the project and because they want to support a specific cause. In reward-based crowdfunding, backers typically provide small amounts of money in exchange for a reward. This reward can be a prototype of the item that will be produced, or branded merchandise like a unique t-shirt or a discount on the product when it is ready; this is the most widespread form of crowdfunding (Belleflamme et al., 2014; Cappa et al., 2022b; Davis et al., 2017; Kraus et al., 2016; Zhang and Chen, 2019). In lending-based crowdfunding, on the other hand, entrepreneurs raise funds in the form of loans that they will pay back to lenders over a pre-determined timeline with a set interest rate (Moysidou and Hausberg, 2019). Finally, equity-based crowdfunding is based on the exchange of shares in a private company for financial capital (Block et al., 2018), in a manner similar to what happens during acquisitions through traditional stock markets.
Through crowdfunding, it may be possible to collect funds in an easier and faster manner than traditional sources of financing. Indeed, crowdfunding involves fewer duties and complexities because it directly asks the general public, connected through web-based platforms, for financial help to support entrepreneurial ideas (Belleflamme et al., 2014; Bi et al., 2017; Mollick, 2014). In particular, proper due diligence of the entrepreneurial project is not required (Cappa, Franco, Ferrucci, et al., 2022b), and the backers involved are mainly non-professional investors who do not scrutinize the entrepreneurial project in the way traditional investors would (Ahlers et al., 2015; Cholakova and Clarysse, 2015; Cumming et al., 2019). Moreover, the nature of the interaction is online only, rather than face-to-face, thus limiting the exchange of information with backers (Cappa et al., 2022b). In fact, in crowdfunding contexts, entrepreneurs can only promote their projects on web-based platforms, using the project’s Web page to convey information about the product, the venture’s business model, objectives, motivations, principles, vision and results achieved, as well as the resources still needed and how the funds will be used. Finally, the definition of a supervisory authority and the definition of clear regulations are still in development and not yet uniform worldwide (Cicchiello et al., 2020; Vismara, 2016).
In the context of crowdfunding, the focus of academics and practitioners is to involve the greatest number of dispersed individuals possible in order to maximize the funds raised. Thus, in this instance, it is possible to collect a large quantity of valuable data from individuals. The backers participating in a crowdfunding call for money include both individuals who would personally like to buy the product or service for which money is being collected, and also people who would like to support the entrepreneurial venture for intrinsic and extrinsic reasons (Deci and Ryan, 2000; Ryan and Deci, 2000). With respect to the former situation, backers may be the first customers to whom the outcomes of the call for money can be offered, as they are aware of the existence of these products before they are actually launched on the market (Paschen, 2017; Petráš et al., 2019). In this view, crowdfunding campaigns are a way to test a product’s market fit or produce a service that is better aligned with a marketable version that meets the needs of prospective customers (Paschen, 2017; Petráš et al., 2019; Taghian and Shaw, 2010). In addition, individuals may contribute to entrepreneurial ventures that they like, that are related to the intrinsic pleasure of doing so, or for the sake of gaining something other than the final product, in return for the financial support provided (Cholakova and Clarysse, 2015; Nielsen and Binder, 2021; Ordanini et al., 2011). In this case, backers can be seen as being constituted by both customers and non-customers, whose data can be extremely useful in allowing a firm to turn them into prospective customers and in allowing it to have a better grasp of prospective client needs. Therefore, information from backers of crowdfunding campaigns includes that of both customers and non-customers, and it can be extremely useful when refining the marketability of products and services. Based on the above, the third proposition in this study is related to the benefit that can arise from collecting big data through crowdfunding:
During crowdfunding campaigns, organizations can collect big data from (customers and) non-customers from which they can extract valuable outcomes.
5. Discussion
This study has sought to highlight how crowd-based phenomena can also be a valuable source of big data in relation to non-customers and how this can be a crucial resource for organizations. When considering the possible benefits of collecting big data from individuals, scholars and managers have so far mainly considered current customers. What is more, when dealing with crowd-based phenomena, scientific attention has mainly been devoted to the ultimate aims of this phenomenon: maximizing valuable ideas from crowdsourcing, data collection and analysis from citizen science, and funds collected from crowdfunding campaigns. Indeed, thus far these two approaches still have not been considered jointly. This paper has presented the argument that crowd-based phenomena provide the opportunity to reach a large amount of individuals comprising both an organization’s current customers and its non-customers, and that the latter can become a valuable resource if sufficient effort is devoted to collecting big data from them.
Grounding on the lens of the resource-based view (Barney, 1991) in the context of crowd-based phenomena and big data, this study stresses the benefits brought about by their synergistic integration. Three propositions have been posited regarding the benefits generated by the collection of big data from customers and non-customers through crowd-based activities to reveal the possible positive results that may come from joining big data and crowd-based phenomena. In this way, a framework, reported in Figure 1, has been developed that aims to highlight the benefits produced by this integrated approach. Organizations can benefit from the knowledge created as a result of the crowd-based initiative (the upper part of the figure), while also collecting big data from customers and non-customers alike (the lower right of the figure). In this way, organizations can jointly tap into crowd-based phenomena and big data to gain insights into non-customers, improving organization performance (the area contained within the dotted line in the figure). This information can be a valuable resource to be integrated with the big data coming from existing customers (the lower left of the figure), and it could generate a considerable competitive advantage and have a positive impact on performance. Therefore, in addition to advancing scientific knowledge of big data and crowd-based phenomena, an overview is provided of the way they can be jointly applied, along with providing useful advice for managers and policymakers trying to improve organizational performance.
Integrated framework for the simultaneous implementation of “big data” and “crowd-based phenomena” to extract valuable insights from (customers and) non-customers
Integrated framework for the simultaneous implementation of “big data” and “crowd-based phenomena” to extract valuable insights from (customers and) non-customers
6. Conclusions
Big data and crowd-based phenomena are increasingly being used by organizations in recent years, thanks to the digitalization of the business environment and of society as a whole, and thanks to IT advancements, which have made technology-mediated interactions frequent on a worldwide scale. In this research it has been conceptually argued that an integrated approach that leverages the two jointly could generate additional valuable resources for private and public organizations. In this way, this research enriches scientific knowledge of big data and crowd-based phenomena and contributes to the understanding of their interconnections. Moreover, the integrated framework posited in this study is an endeavour to stimulate further academic analyses of these topics. In addition to scholars, this research is also of interest to managers and policymakers, as it provides initial suggestions on how organizations can leverage crowd-based phenomena and big data to collect valuable insights and benefit their performance. As big data helps to create new knowledge, which is crucial to mitigate uncertainty and risks and to improve performance (Cappa et al., 2021; Intezari and Gressel, 2017; Sumbal et al., 2017), a further means for collecting it can be extremely useful for all organizations. Furthermore, the outcomes of this research, and the further developments that it can spawn, are also relevant in light of the current Horizon Europe programme, which devotes a great deal of attention to big data with the “European Open Science Cloud” (Horizon Europe, 2022a) that allows researchers to store, curate and share data, and with the “Cluster 4: Digital, Industry and Space” part of the funding programme (Horizon Europe, 2022b), in which big data is one of the areas of intervention.
There are several future developments that can arise from this study, which also contemplates its limitations. Firstly, as this study is conceptual, future research should empirically quantify how much big data from non-customers can benefit a firm’s performance. Secondly, while this study has argued that additional benefits that can arise from big data collected from non-customers through crowd-based phenomena, this should be compared with findings deriving from big data collected from actual customers. Moreover, it could be worth exploring whether the big data collected from these two types of individuals should be managed together or separately by firms to maximize the insights that might be extracted. Furthermore, this study has mainly stressed the benefits of big data collected from customers and non-customers through crowd-based phenomena, but future studies should also focus on the drawbacks that could come from the additional data collected. As there is also a dark side to big data, related to information overload (Tian, 2017; Whitler, 2018), privacy and security issues (Brook, 2019; Trabucchi et al., 2017) and costs for storing data (Cappa et al., 2021), these aspects should be properly analysed by future studies in relation to big data collected from non-customers. Finally, this study has focussed on the benefits that big data can have on economic types of performance for organizations, but it is also increasingly important for all kinds of organizations to simultaneously achieve economic, social and environmental sustainability (Dinia et al., 2022; Franco, 2021; Morea et al., 2021; Rosso et al., 2020); thus, another promising research direction is analysing the impact that big data from non-customers may have on society and the environment.
Francesco Cappa would like to gratefully acknowledge Ermenegildo Zegna that supported this research; thanks to the EZ Founder’s Scholarship 2019–2020. The funder had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

