This study introduces CLIMB2-OLIMP, a dual-maturity model designed to facilitate the structured integration of Generative Artificial intelligence (AI) (GenAI) into New Product Development (NPD) processes. The model aims to provide a comprehensive tool for organizations to integrate GenAI into their NPD processes, ensuring a strong operational foundation.
Developed using the Design Science Research approach, CLIMB2-OLIMP first evaluates an organization’s NPD maturity (CLIMB2) before assessing its readiness for GenAI implementation (OLIMP). The OLIMP component uniquely incorporates a prescriptive element, utilizing large language models (LLMs) to generate tailored improvement pathways with a clear cost-benefit perspective. A systematic literature review was conducted, and the model development involved iterative stages and expert validation.
Case studies in manufacturing organizations demonstrated the model’s effectiveness, revealing moderate NPD maturity but limited GenAI adoption. The research emphasizes structured AI integration, including workforce upskilling, strategic alignment, and ethical considerations.
CLIMB2-OLIMP provides diagnostic insights and actionable recommendations, serving as a comprehensive tool for organizations seeking to integrate GenAI into their NPD processes. It guides organizations in advancing their AI maturity with a clear cost-benefit perspective and supports a bold, entrepreneurial strategy for AI adoption.
CLIMB2-OLIMP is original in its dual-maturity approach, conditioning GenAI maturity assessment on NPD maturity, and its prescriptive component driven by LLMs, which addresses a significant gap in existing descriptive AI maturity models that lack actionable guidance and cost-benefit analysis.
Quick value overview (QVO)
Interesting because: This study introduces CLIMB2-OLIMP, a novel dual-maturity model with a budget-aware prescriptive tool. This model advances GenAI integration in manufacturing by conditioning Artificial intelligence (AI) adoption on a minimum New Product Development (NPD) baseline and providing a tool-supported roadmap that explicitly quantifies the cost, benefit, and risk of transformation pathways.
Theoretical value: The paper advances maturity model theory by: (1) Establishing a contingency-based relationship where NPD discipline acts as a prerequisite variable that moderates the efficacy of GenAI adoption, (2) Advancing Design Science Research (DSR) through the embedding of a quantifiable, prescriptive tool that moves beyond pure descriptive assessment.
Practical value: For manufacturing managers, CLIMB2-OLIMP is a ready-to-use decision-support tool that reduces the risk of GenAI failure. It provides budget-anchored, KPI-linked improvement plans by enforcing a crucial sequence (NPD baseline first, then GenAI scaling). The tool’s ability to present and compare multiple alternative transformation paths allows managers to select the highest-utility strategy tailored to specific resource constraints, accelerating time-to-value while controlling costs.
1. Introduction
The emergence of ChatGPT represents an outstanding example of how sophisticated generative models can transform the way people interact with technology. The application attracted 1 million users within 5 days and 100 million within 2 months of its launch. The groundbreaking nature of this model creates fascinating possibilities for applying generative AI (GenAI) in the field of product development, where interactions between humans and technology are essential.
AI maturity models support organizations in assessing their proficiency with AI technologies, facilitating the implementation journey, and providing benchmarks by comparing AI capabilities with those of leading organizations (Molla et al., 2009). However, in terms of maturity in the utilization of generative AI solutions (GenAI maturity), there is a significant lack of information in the current scientific literature. Another observation from the literature concerned the prescriptive nature of the models – most identified models are descriptive, meaning they depict an organization’s maturity level without offering guidance on advancing from one maturity level to the next. Some models could be considered partially prescriptive as they propose steps for progression; however, they lack alternative pathways or cost-benefit analyses for reaching higher maturity levels (Röglinger et al., 2012) or they use weighting systems to determine the more significant dimensions, instead of utilizing more modern solutions, such as those based on large language models (LLMs). Furthermore, maturity models are often subject to criticism in the literature, sometimes even acknowledged by the authors of these models, yet rarely addressed in subsequent model development. Another aspect is that AI maturity models focus strictly on the levels of implementation of AI solutions, where determining whether an organization is even ready for this implementation is confined to the zero or first maturity level of the model.
Therefore, the aim of this article is to develop a general GenAI maturity model that is as resilient to criticism as possible, which in this study will focus on assessing an organization’s maturity in utilizing generative AI solutions in the NPD process. The authors intend to design the prescriptive part of the model based on LLMs. The authors also propose that an organization should attain a certain maturity level in product development before assessing its maturity in utilizing generative AI within this process. In other words, a certain degree of product development maturity conditions the assessment of generative AI maturity within the process. To achieve this goal, the following research questions were formulated:
What is the current state of maturity models related to the generative artificial intelligence maturity in the literature?
How to design a maturity model for the use of generative artificial intelligence in new product development processes (considering the current criticisms of existing models) that conditions an organization’s maturity in utilizing generative AI solutions on the maturity of the product development process?
How to design a prescriptive component of the maturity model based on large language models?
The paper is structured as follows: Section 2 provides a description theoretical background of the study. Section 3 shows the research approach used, followed by a detailed description of the approach’s implementation in Section 4. Section 5 discusses the implications for organizations and researchers, while Section 6 summarizes the paper, indicating its limitations and future research directions.
2. Theoretical background
2.1 Construction and purpose of maturity models
Recent literature on maturity models has expanded significantly (Păunescu and Acatrinei (Pantea), 2012; Tarhan et al., 2016). Typically, maturity models employ five-level, staged, or continuous systems. DSR is a popular method for developing maturity models, especially in information systems domain. In the context of designing artifacts such as maturity models, previous research (Becker et al., 2009; De Bruin et al., 2005; Hevner et al., 2004) presents various approaches (Röglinger and Pöppelbuß, 2011). The work by De Bruin et al. (2005) recommends six different developmental stages that support the creation of a maturity model and its adaptation to specific, complex tasks (Röglinger and Pöppelbuß, 2011). In contrast, Becker et al. (2009), in relation to the design of maturity models, use the design science guidelines proposed by Hevner et al. (2004) and define eight key requirements that should be met when creating a useful model. There are two primary approaches: the top-down method, which defines fixed stages and criteria (Becker et al., 2009; De Bruin et al., 2005), and the bottom-up method, beginning with specific characteristics (Lehmkuhl et al., 2013). The top-down approach suits emerging fields, while bottom-up fits mature areas (De Bruin et al., 2005).
Models have different objectives: descriptive models assess an organization’s current state, prescriptive models provide improvement guidance, and comparative models facilitate inter-organizational comparisons (García-Mireles et al., 2012). Röglinger et al. (2012) note that most maturity models analyzed in the literature are descriptive in nature. They highlight that there are three rules for building prescriptive maturity models:
Improvement Measures: Tangible actions for every maturity stage.
Decision Calculus: A framework to help leaders evaluate alternatives through cost-benefit analysis.
Implementation Methodology: A clear, accessible procedure for adapting and applying the model.
While many models fail to meet all prescriptive criteria, those that do offer a significant advantage by actively guiding organizations through change. Applying maturity models in general can enhance long-term process performance (Păunescu and Acatrinei (Pantea), 2012). Increased maturity offers benefits like competitive advantage, improved process management, efficiency, and higher quality (Forstner et al., 2014).
2.2 Criticism of maturity models
Maturity models in the literature face multiple critiques. Firstly, their sheer number is considered overwhelming (Röglinger et al., 2012), and many of them are similar, yet often lack sufficient documentation to allow for replication of studies (Albliwi et al., 2014). Wendler (2012) also states that there is a lack of evaluation and validation of developed maturity models. There is little empirical evidence for their effectiveness (Tarhan et al., 2016) and there are criticisms that they do not consider the financial aspect (Albliwi et al., 2014). Many models emphasize achieving a predetermined final state rather than focusing on continuous improvement, and they often lack step-by-step guidance for maturity advancement (Cronemyr and Danielsson, 2013; Röglinger et al., 2012). There is also criticism regarding the scientific rigor of many models, as they are often based on practitioner experience rather than theoretical foundations (Albliwi et al., 2014). Additionally, models face a dilemma between universality and specificity, where some argue for a universal approach for cross-industry comparisons, while others advocate for customization to specific company contexts (Batenburg et al., 2005; Dayan and Evans, 2006). Specific shortcomings of the most commonly studied models are also noted. CMM/CMMI is frequently criticized for being overly complex and resource-intensive, making it challenging to implement (Kuilboer and Ashrafi, 2000; Reifer, 2000; van de Weerd et al., 2010). It is also seen as inflexible and neglectful of factors like culture and organization, crucial to project success (Höggerl et al., 2006). Similarly, OMG-BPMM is critiqued for overlooking IT support (Röglinger et al., 2012). Despite criticism, some of which is acknowledged by the models’ creators, it is rarely addressed in subsequent iterations of model development.
2.3 NPD process maturity
The concept of process maturity in NPD can be defined as achieving consistent outcomes for key process factors. An NPD process maturity model, therefore, evaluates an organization’s ability to manage this process effectively, by assessing tools and practices. The literature lacks a standardized classification of best practices in NPD. However, researchers like Barczak et al. (2009) and Kahn et al. (2012) categorize practices across dimensions such as strategy, process, and performance measurement. Lean product development offers another classification, emphasizing people, process, and tools (Morgan and Liker, 2006). Rossi and Terzi (2017) synthesized these classifications into a model of best practices in NPD. The study identified 107 best practices commonly cited in the literature and categorized them into an updated framework through consultations with experts. Although all best practices were sourced from existing literature, the practitioners’ insights were essential in refining the framework, as they highlighted certain practices as key to product development even though they were not included in prior classifications.
2.4 AI maturity
AI poses unique implementation challenges, both technical and social, requiring organizations to develop AI maturity to address these hurdles. Maturity assessment help organizations identify gaps in AI capabilities, expedite AI implementation, and ensure alignment with AI goals. To help with that target, Fukas and Thomas (2023) developed a universal AI management reference model. This framework provides a comprehensive approach for AI maturity, covering economic, technical, and ethical aspects essential for effective AI integration.
The landscape of digital transformation shifted significantly following the public release of ChatGPT, which served as a catalyst for the rapid adoption of generative AI (GenAI) within industrial R&D and innovation sectors (Cooper, 2023). Dr Robert G. Cooper, the creator of the globally renowned Stage-Gate® process and an internationally recognized expert in product management, focuses his latest research on the application of AI solutions within the product development domain. According to Cooper, the utilization of AI is poised to become the next significant milestone in the NPD process and will fundamentally alter the NPD landscape. The scholar asserts that numerous AI applications in the NPD process have already resulted in dramatic transformations, enhancing the speed, efficiency, and quality of NPD for early adopting firms. Furthermore, he emphasizes that this integration within organizations must not be a fragmented, piecemeal endeavor, but rather a holistic transformation guided by a bold, entrepreneurial strategy, which must be created and championed by the organization’s senior management team (Cooper, 2023).
The field of GenAI in business is rapidly advancing, yet current literature lacks comprehensive insights on GenAI maturity. Reznikov (2024) highlights several strategic approaches to GenAI adoption. A gradual integration strategy allows companies to introduce GenAI incrementally into existing processes, minimizing disruptions, and enabling ongoing assessment to ensure alignment with business goals and regulatory standards. Another approach, transformational integration, involves a more radical overhaul, where companies, often in the tech sector, redesign business models and processes around GenAI capabilities to gain a competitive edge through innovative, AI-driven products and services. Finally, partnership-driven collaboration involves forming strategic alliances with AI providers, research institutions, and stakeholders to co-develop and implement GenAI solutions, offering access to advanced technology and expertise for more effective deployment.
Chukhlomin (2024) suggests that insights from AI maturity models can inform the development of GenAI maturity models. By analyzing core components of AI maturity models, key elements can be adapted to address GenAI’s unique challenges and opportunities. However, GenAI introduces additional complexities that existing AI maturity models may not fully address, raising questions about how specific practices and maturity levels should be defined in GenAI maturity models.
3. Research design
3.1 Research approach
Research approach to model development is based on the previously mentioned DSR approach, as proposed by Becker et al. (2009) and grounded in the research of Hevner et al. (2004). The model development process includes the following design stages: problem definition, comparison of existing maturity models, determination of model development strategy, iterative model development, transfer and evaluation concept, media implementation, and model evaluation. Figure 1 illustrates the overall research approach.
The horizontal flowchart shows seven boxes arranged from left to right, connected by rightward arrows. The first box is labeled “Problem definition”. The description inside reads: “The problem was identified based on insights from the literature and interviews and confirmed by experts”. The second box is labeled “Comparison of existing maturity models”. The description reads: “A review of the literature on existing maturity models was conducted, and they were compared with one another”. The third box is labeled “Determination of development strategy”. The description reads: “The choice of strategy for building a new model and addressing the criticism of currently available maturity models”. The fourth box is labeled “Iterative model development”. The description reads: “Verification of the model’s reliability by experts and incorporation of their suggested changes in subsequent iterations”. The fifth box is labeled “Transfer and evaluation concept”. The description reads: “Making the questionnaires available online and publishing the model in a scientific journal”. The sixth box is labeled “Media implementation”. The description reads: “Implementation of the concept prepared in the previous step”. The seventh box is labeled “Model evaluation”. The description reads: “Research using the final version of the model in organizations and evaluation by target groups”.Research approach. Source: Authors’ own work, based on Becker et al. (2009)
The horizontal flowchart shows seven boxes arranged from left to right, connected by rightward arrows. The first box is labeled “Problem definition”. The description inside reads: “The problem was identified based on insights from the literature and interviews and confirmed by experts”. The second box is labeled “Comparison of existing maturity models”. The description reads: “A review of the literature on existing maturity models was conducted, and they were compared with one another”. The third box is labeled “Determination of development strategy”. The description reads: “The choice of strategy for building a new model and addressing the criticism of currently available maturity models”. The fourth box is labeled “Iterative model development”. The description reads: “Verification of the model’s reliability by experts and incorporation of their suggested changes in subsequent iterations”. The fifth box is labeled “Transfer and evaluation concept”. The description reads: “Making the questionnaires available online and publishing the model in a scientific journal”. The sixth box is labeled “Media implementation”. The description reads: “Implementation of the concept prepared in the previous step”. The seventh box is labeled “Model evaluation”. The description reads: “Research using the final version of the model in organizations and evaluation by target groups”.Research approach. Source: Authors’ own work, based on Becker et al. (2009)
Based on Becker et al. (2009) and currently available maturity models, the problem definition aligns with the premises and research problem outlined in the introduction of this article. The discussion centers on the growing impact of AI, particularly generative AI, at both the micro (organizations and processes) and macro (economy) levels. A maturity model aids companies in determining their current position, providing managers with insights into AI advancements that allow for critical evaluation of organizational performance and actions toward optimization. The next step was the systematic literature review, conducted following a structured search and identification process based on the methodologies proposed by Webster and Watson (2002) and Brocke et al. (2009).
In the initial stage, the keywords “maturity model” and “AI” or “artificial intelligence” were selected to ensure the inclusion of potential maturity models related to the use of generative AI. The search was performed across multiple academic databases, including Scopus, Web of Science, EBSCOhost, ProQuest, ACM Digital Library, Wiley Online Library, and ScienceDirect. These databases were chosen based on the quality standards required for academic search systems, as defined by Gusenbauer and Haddaway (2020). Additionally, the search was extended to IEEE Xplore and SpringerLink, and a backward and forward citation search was conducted (Webster and Watson, 2002). As a supplementary measure, Google Scholar was also utilized. The following inclusion criteria were applied: (1) a primary focus on maturity models related to AI implementation, (2) restriction to publications from the last ten years (2014–2024) to ensure coverage of the most recent research, (3) selection of studies within the field of management sciences, typically classified under categories such as Business & Management, Social Sciences, and Decision Sciences, and (4) limitation to publications written in English. Exclusion criteria were also defined, including: (1) the absence of full-text access, (2) false positives (e.g. cases where the acronym “AI” referred to something other than AI), and (3) articles unrelated to AI maturity, such as technology maturity models, models assessing proficiency in AI-assisted tools, organizational white papers, corporate reports, online tools from various organizations, and undergraduate or master’s theses. The initial keyword search yielded a total of 494 results. After removing duplicates, screening titles and abstracts, and excluding articles that did not meet the predefined criteria, 31 studies were selected for in-depth analysis. Following further refinement, 21 research papers were excluded, resulting in a final selection of 10 articles. Additional citation analysis of the identified articles led to the inclusion of two more relevant studies, while an additional publication was retrieved through a Google Scholar search. Based on this selection process, the final dataset consisted of 13 articles. The literature review process is visually depicted in Figure 2.
The vertical flowchart presents the literature search and screening process with labeled stages on the left and sequential boxes in the center. On the left, three vertical stage labels appear from top to bottom: “Definition of keywords and selection of criteria”, “Articles identification”, and “Selection of relevant publications”. In the “Definition of keywords and selection of criteria” stage, a box reads “Search results (n equals 494)”. Below it, database boxes are arranged in two rows. The first row reads: “Scopus (n equals 108)”, “Wiley Online Library (n equals 37)”, “A C M Digital Library (n equals 79)”, and “EbscoHost (n equals 59)”. The second row reads: “I E E E Xplore (n equals 54)”, “SpringerLink (n equals 42)”, “ScienceDirect (n equals 10)”, “Web of Science (n equals 87)”, and “ProQuest (n equals 18)”. A downward arrow labeled “Rejected records: 463” leads to the “Articles identification” stage, with a text box labeled “Articles after analyzing titles and abstracts and rejecting those not meeting criteria (n equals 31)”. Another downward arrow labeled “Rejection of unrelated articles: 21 articles” leads to the final stage, “Selection of relevant publications”. In this stage, the box reads “Articles after reading the full text (n equals 10)”. A downward arrow labeled “Source analysis and Google Scholar: plus 3 articles” leads to the final box at the bottom, which reads “Articles for final analysis (n equals 13)”.Literature review process. Source: Authors’ own work
The vertical flowchart presents the literature search and screening process with labeled stages on the left and sequential boxes in the center. On the left, three vertical stage labels appear from top to bottom: “Definition of keywords and selection of criteria”, “Articles identification”, and “Selection of relevant publications”. In the “Definition of keywords and selection of criteria” stage, a box reads “Search results (n equals 494)”. Below it, database boxes are arranged in two rows. The first row reads: “Scopus (n equals 108)”, “Wiley Online Library (n equals 37)”, “A C M Digital Library (n equals 79)”, and “EbscoHost (n equals 59)”. The second row reads: “I E E E Xplore (n equals 54)”, “SpringerLink (n equals 42)”, “ScienceDirect (n equals 10)”, “Web of Science (n equals 87)”, and “ProQuest (n equals 18)”. A downward arrow labeled “Rejected records: 463” leads to the “Articles identification” stage, with a text box labeled “Articles after analyzing titles and abstracts and rejecting those not meeting criteria (n equals 31)”. Another downward arrow labeled “Rejection of unrelated articles: 21 articles” leads to the final stage, “Selection of relevant publications”. In this stage, the box reads “Articles after reading the full text (n equals 10)”. A downward arrow labeled “Source analysis and Google Scholar: plus 3 articles” leads to the final box at the bottom, which reads “Articles for final analysis (n equals 13)”.Literature review process. Source: Authors’ own work
The next step was defining a model development strategy. Becker et al. (2009) identify key foundational strategies for model development: building a completely new model, enhancing an existing model, combining several models into one, and transferring structures or content from existing models to new application areas. It is not necessary to choose only one strategy. The authors would also like to consider the previously mentioned critique of models during the model development, which includes:
Model that is too complex/time-consuming/resource intensive (Kuilboer and Ashrafi, 2000; Reifer, 2000; van de Weerd et al., 2010)
An excessive number of models (Röglinger et al., 2012)/Similarity among many existing models and inadequate documentation (Albliwi et al., 2014)
Lack of theoretical foundations or scientific guidelines/Uncritical adoption of the CMM template (Albliwi et al., 2014)
Frequent lack of empirical validation of developed maturity models (Wendler, 2012)
Limited guidance on specific steps for improving maturity levels/Criteria to help users understand methodical progression to the next stage (Cronemyr and Danielsson, 2013; Röglinger et al., 2012)
Limited empirical evidence on the effectiveness and utility of models (Tarhan et al., 2016) /Lack of economic foundation (Albliwi et al., 2014)
Omission of important factors in models (e.g. focus on processes while overlooking people, culture, etc.) (Höggerl et al., 2006; Röglinger et al., 2012)
Models that are too universal or too specific (Batenburg et al., 2005; Dayan and Evans, 2006)
The following step was iterative model development that is central to the construction of any maturity model, where the model’s architecture – levels and dimensions – is constructed. Common methods to assist in defining assessment criteria for each level include literature reviews, expert interviews, or surveys to identify success factors and typical use cases. The authors followed Becker’s et al. (2009) iterative approach.
Then, during the transfer and evaluation and media implementation phases, it was necessary to establish various methods for disseminating the results to both academic circles and end-users. In addition to providing the model in the form of documents (e.g. a questionnaire), it is suggested to use digital tools (e.g. via the Internet). The authors of this article, in addition to prepared questionnaires, also decided to share an open repository with the model, including the prescriptive tool.
The aim of the final phase was to verify whether the maturity model provides the expected benefits. It is suggested to conduct case studies, where a small, selected group tests the new maturity model or, alternatively, the model can be made available online for free, allowing users to perform self-assessment. The authors of this article made the model available online, but the main evaluation method involved conducting case studies.
3.2 Model development
Below are the results of the literature review on AI maturity models. The thirteen identified models have been classified and presented in Table 1. The table includes the number and names of maturity levels and the “elements” of the model – since authors use various terms to describe model components, the term “elements” has been selected by the authors to describe these parts of the model. The table also contains the domain for which the models were developed, the design approach chosen by the authors, the detail level of maturity levels and model elements, the suggested maturity assessment method, the sample size, and type of entities involved (if a study was conducted), and the model type (descriptive or prescriptive).
Reviewed AI maturity models
| Authors | Name/Acronym | Domain | Levels | Elements | Description of elements/Levels | Assessment method | Study group | Design approach | Model type |
|---|---|---|---|---|---|---|---|---|---|
| Alsheiabni et al. (2019) | None | General | 5 (Initial, Assessing, Determined, Managed, Optimized) | 4 (AI Functions, Data Structure, People, Organizational) | Brief | None | None | Bottom-up | Descriptive |
| Yablonsky (2019) | Data-Driven AI Innovation Maturity Framework | Innovation Management | 5 (Human Led; Human Led, Machine Supported; Machine Led, Human Supported; Machine Led, Human Governed; Machine Led and Machine Governed) | 4 (Who produces insight, Who decides and how, Who acts based on decision, Advanced analytics) | Detailed | None | None | Bottom-up | Descriptive |
| Ellefsen et al. (2019) | None | Logistic Processes | 4 (AI Novice; AI Ready; AI Proficient; AI Advanced) | 5 (Strategy, Organization, Data, Technology, Operations) | Brief | Questionnaire | Four logistics sector companies | Adopted organizational model | Descriptive |
| Lichtenthaler (2020) | None | General (AI Management) | 5 (Initial Intent, Independent Initiative, Interactive Implementation, Interdependent Innovation, Integrated Intelligence) or 7 (adding Level 0: Isolated Ignorance and Level +: Intuitive Ingenuity) | 3 (different types of AI, different types of human intelligence, meta-intelligence combining different types of intelligence) | Brief | None | None | Bottom-up | Descriptive |
| Fukas et al. (2021) | Auditing AI Maturity Model (A-AIMM) | Auditing Firms | 5 (Initial, Assessing, Determined, Managed, Optimized) | 8 (Technologies, Data, People and Competencies, Organization and Processes, Strategy and Management, Budget, Products and Services, Ethics and Regulations) | Detailed | None | None | Bottom-up | Descriptive |
| Yams et al. (2021) | AI Innovation Maturity Index (AIMI) | Innovation Management | 5 (Foundational, Experimenting, Operational, Inquiring, Integrated) | 7 (Data, Technology, Organization, Strategy, Ecosystems, Mindsets, Trustworthiness) | Brief | None | None | Top-down | Descriptive |
| Holmström (2021) | AI Readiness Framework | General | 5 (None, Low, Moderate, High, Excellent) | 4 (Technology, Activities, Boundaries, Goals) | Brief | Questionnaire | Insurance sector company | Bottom-up | Descriptive |
| Schuster et al. (2021) | SME-focused AIMM | Small and Medium Enterprises | 5 (Novice, Explorer, User, Translator, Pioneer) | 7 (Culture, Data, Ethics, Organization, Privacy, Strategy, Technology) | Brief | None | None | Bottom-up | Descriptive |
| Chen et al. (2021) | I-AI Maturity Model | Intelligent Manufacturing Systems and Industrial Processes | 5 (Planning Level, Specification Level, Integration Level, Optimization Level, Leading Level) | 2 (Industry, AI) | Very detailed | Questionnaire | None | Bottom-up | Partially prescriptive |
| Noymanee et al. (2022) | AI MM for Government Administration and Service | Government and Public Sector Organizations | 5 (Rookie Level, Beginner Level, Operational Level, Expert Level, Mastery Level) | 4 (Strategy, Organization, Information, Technology) | Brief | None | None | Bottom-up | Descriptive |
| Das et al. (2023) | Trustworthy AI System MM (TAS-MM) | General | 4 (Level 0, Level 1, Level 2, Level 3) | 4 (Auditability, Explainability, Fairness, Safety) | Detailed | None | None | Bottom-up | Descriptive |
| Sonntag et al. (2024) | SMMT Maturity Model | Manufacturing Sector Companies | 5 (Initial, Experimental, Practicing, Integrated, Transformed) | 5 (Culture and Competencies, Strategy, Data, Organization and Processes, Technology) | Detailed | Questionnaire | Three divisions of one manufacturing company | Bottom-up | Partially prescriptive |
| Chukhlomin (2024) | EMERALD-GenAI-CMM-OAL | Online Education and Adult Learning | 5 (Pre-generative AI Level, Foundational Level, Intermediate Level, Advanced Level, Expert Level) | 5 ((E) External Environment and Graduate Skill Expectations (M) Technological Means, Tools and Platforms (examples) (E) Essential Tasks and Use Cases (RA) Required Abilities, Skills, and Competencies (LD) Learning Designs and Strategies) | Detailed | None | None | Bottom-up | Descriptive |
| Authors | Name/Acronym | Domain | Levels | Elements | Description of elements/Levels | Assessment method | Study group | Design approach | Model type |
|---|---|---|---|---|---|---|---|---|---|
| None | General | 5 (Initial, Assessing, Determined, Managed, Optimized) | 4 ( | Brief | None | None | Bottom-up | Descriptive | |
| Data-Driven | Innovation Management | 5 (Human Led; Human Led, Machine Supported; Machine Led, Human Supported; Machine Led, Human Governed; Machine Led and Machine Governed) | 4 (Who produces insight, Who decides and how, Who acts based on decision, Advanced analytics) | Detailed | None | None | Bottom-up | Descriptive | |
| None | Logistic Processes | 4 ( | 5 (Strategy, Organization, Data, Technology, Operations) | Brief | Questionnaire | Four logistics sector companies | Adopted organizational model | Descriptive | |
| None | General ( | 5 (Initial Intent, Independent Initiative, Interactive Implementation, Interdependent Innovation, Integrated Intelligence) or 7 (adding Level 0: Isolated Ignorance and Level +: Intuitive Ingenuity) | 3 (different types of | Brief | None | None | Bottom-up | Descriptive | |
| Auditing | Auditing Firms | 5 (Initial, Assessing, Determined, Managed, Optimized) | 8 (Technologies, Data, People and Competencies, Organization and Processes, Strategy and Management, Budget, Products and Services, Ethics and Regulations) | Detailed | None | None | Bottom-up | Descriptive | |
| Innovation Management | 5 (Foundational, Experimenting, Operational, Inquiring, Integrated) | 7 (Data, Technology, Organization, Strategy, Ecosystems, Mindsets, Trustworthiness) | Brief | None | None | Top-down | Descriptive | ||
| General | 5 (None, Low, Moderate, High, Excellent) | 4 (Technology, Activities, Boundaries, Goals) | Brief | Questionnaire | Insurance sector company | Bottom-up | Descriptive | ||
| SME-focused AIMM | Small and Medium Enterprises | 5 (Novice, Explorer, User, Translator, Pioneer) | 7 (Culture, Data, Ethics, Organization, Privacy, Strategy, Technology) | Brief | None | None | Bottom-up | Descriptive | |
| I- | Intelligent Manufacturing Systems and Industrial Processes | 5 (Planning Level, Specification Level, Integration Level, Optimization Level, Leading Level) | 2 (Industry, | Very detailed | Questionnaire | None | Bottom-up | Partially prescriptive | |
| Government and Public Sector Organizations | 5 (Rookie Level, Beginner Level, Operational Level, Expert Level, Mastery Level) | 4 (Strategy, Organization, Information, Technology) | Brief | None | None | Bottom-up | Descriptive | ||
| Trustworthy | General | 4 (Level 0, Level 1, Level 2, Level 3) | 4 (Auditability, Explainability, Fairness, Safety) | Detailed | None | None | Bottom-up | Descriptive | |
| SMMT Maturity Model | Manufacturing Sector Companies | 5 (Initial, Experimental, Practicing, Integrated, Transformed) | 5 (Culture and Competencies, Strategy, Data, Organization and Processes, Technology) | Detailed | Questionnaire | Three divisions of one manufacturing company | Bottom-up | Partially prescriptive | |
| EMERALD-GenAI-CMM-OAL | Online Education and Adult Learning | 5 (Pre-generative | 5 ((E) External Environment and Graduate Skill Expectations (M) Technological Means, Tools and Platforms (examples) (E) Essential Tasks and Use Cases (RA) Required Abilities, Skills, and Competencies (LD) Learning Designs and Strategies) | Detailed | None | None | Bottom-up | Descriptive |
3.2.1 Determination of development strategy
The authors propose an integrated approach rather than creating a new model from scratch, combining existing models’ structures and content while addressing current criticisms. Model aims to be: simple and resource-efficient, well-documented for reuse, theoretically grounded, empirically validated, prescriptive, financially-aware, comprehensive yet universal, and hierarchically structured.
3.2.2 Iterative model development
3.2.2.1 NPD maturity model
The proposed approach measures product development process maturity before assessing generative AI utilization maturity within that process. The authors believe that, considering the criticism regarding the number of models, there is no need to develop another maturity model for the product development process.
To avoid the proliferation of redundant maturity models, this study adopts the validated CLIMB model, favored for its theoretical grounding and suitability for SMEs (2.5-hour assessment vs. CMMI’s year-long process). This study proposes CLIMB2, an extension that introduces two key refinements. One is about scoring logic – replaces the original percentage-based scoring (0–100%) with a stricter level-locking requirement: all practices within a dimension must reach the same level for that dimension to qualify for that maturity stage. The other one is about operationalization – since the original model lacked a full rubric, the authors developed a comprehensive 107-question questionnaire (Supplementary Material 1), using a five-point scale (A–E) to represent maturity levels.
3.2.2.2 Generative AI maturity model – descriptive part
Fukas et al. (2021) prepared an AI maturity model tailored for audit firms based on the previously discussed AI management reference model (Fukas and Thomas, 2023). This model, as previously mentioned, includes eight dimensions: technology, data, people and competencies, organization and processes, strategy and management, budget, products and services, and ethics and regulations. According to the authors of this article, all these dimensions are also relevant from the generative AI perspective but the specific elements in the dimensions needed to be adapted based on the conducted literature review.
The next step involved defining the levels. A five-level approach, most used in the literature, was chosen. The authors would like to name this model OLIMP, based on the first letters of the names for each level:
Level 1 – Organizing Level: The organization focuses on structuring the NPD process and resources to facilitate the integration of generative AI solutions.
Level 2 – Learning Level: The organization effectively utilizes resources and processes supported by generative AI to improve the outcomes of the NPD process.
Level 3 – Implementing Level: The organization begins to intensively integrate generative AI into existing NPD processes, creating coherent and harmonized operations.
Level 4 – Managing Level: The organization manages and controls NPD processes using generative AI, continuously monitoring progress, and making necessary adjustments.
Level 5 – Pioneering Level: The organization fully harnesses the potential of generative AI in the NPD process, achieving predictable and high-quality results.
It is also essential to define best practices for each dimension to enable the creation of a questionnaire. These practices were identified based on the literature review and interviews conducted by the authors in previous studies, alongside other maturity models identified in this chapter. For elements found in other models, certain items were mapped to a domain associated with generative AI solutions. While some elements could be assigned to more than one category, their position was determined by the authors as the dimension with the greatest influence. Table 2 presents a description of the dimensions with the sources used for their construction. The next step was to prepare the questionnaire. Based on the elements identified above, a questionnaire with responses for each level was prepared (Supplementary Material 2). The authors emphasize that the need for the OLIMP model arises from the lack of an available framework in the literature (at the time of writing this article) that is general but complex or specifically focused on NPD and organizational maturity in generative AI solutions.
Identified elements with sources
| Dimension | Elements | Source |
|---|---|---|
| Technology | Building scalable IT infrastructure supporting AI | Model analysis (Chen et al., 2021) |
| Integration of generative AI technology with other systems used in new product development (ERP, CRM, etc.) | Model analysis (Sonntag et al., 2024) | |
| Automation of generative AI model deployment | Model analysis (Sonntag et al., 2024) | |
| Use of cloud for data storage and processing for generative AI | Model analysis (Chen et al., 2021; Sonntag et al., 2024) | |
| Use of tools for managing the lifecycle of generative AI models | Reviewers’ suggestion | |
| Infrastructure supporting the handling of large datasets | Model analysis (Chen et al., 2021; Sonntag et al., 2024) | |
| Implementation of real-time processing technologies | Model analysis (Chen et al., 2021) | |
| Computational power required for deployment and maintenance of generative AI solutions | Interviews, Model analysis (Chen et al., 2021) | |
| Use of tools (internal or external) based on generative AI in daily work (e.g. ChatGPT, MS Copilot) | Model analysis (Chukhlomin, 2024) | |
| Scalability of generative AI solutions in use | Model analysis (Sonntag et al., 2024) | |
| Data | Building high-quality data sets | Interviews, Model analysis (Chen et al., 2021; Sonntag et al., 2024) |
| Automation of data analysis and processing | Interviews, Model analysis (Chen et al., 2021; Sonntag et al., 2024; Chukhlomin, 2024) | |
| Centralization of data sets (a single data dictionary in the organization) | Interviews, Model analysis (Chen et al., 2021) | |
| Use of advanced tools for data quality assessment | Model analysis (Chen et al., 2021; Sonntag et al., 2024) | |
| Developing a data management strategy | Model analysis (Chen et al., 2021; Sonntag et al., 2024) | |
| Automation of data collection and cleaning processes | Model analysis (Chen et al., 2021) | |
| Identification and integration of data from internal and external sources with current datasets | Model analysis (Chen et al., 2021; Sonntag et al., 2024) | |
| Existence of a standard data model and standard metadata set | Model analysis (Sonntag et al., 2024) | |
| Using generative AI to support data visualization | Model analysis (Chen et al., 2021; Chukhlomin, 2024) | |
| People and competencies | Developing awareness and understanding of generative AI solutions | Literature review, Interviews, Model analysis (Sonntag et al., 2024) |
| Training teams in programming (including prompt engineering) and data analysis | Model analysis (Chen et al., 2021; Sonntag et al., 2024; Chukhlomin, 2024) | |
| Creating interdisciplinary AI teams | Interviews, Model analysis (Sonntag et al., 2024) | |
| Introduction of external generative AI consultants to teams | Literature review | |
| Training in project management using generative AI | Literature review, Model analysis (Chukhlomin, 2024) | |
| Enhancing knowledge transfer in generative AI through knowledge management processes | Model analysis (Sonntag et al., 2024) | |
| Organization and processes | Integrating AI with existing processes (AI solutions freeing specific process’ managers from non-core tasks) | Interviews |
| Automating (product development) processes using generative AI | Model analysis (Sonntag et al., 2024; Chukhlomin, 2024) | |
| Using AI to support decision-making | Interviews, Model analysis (Chen et al., 2021) | |
| Implementing tools supporting AI teamwork (daily tools aiding product managers) | Interviews | |
| Introducing continuous improvement cycles in implementing generative AI solutions (learning from past and present implementations for future use) | Model analysis (Sonntag et al., 2024) | |
| Defined lifecycle management process for software delivering generative AI solutions | Model analysis (Sonntag et al., 2024) | |
| Generative AI-based product development guide | Interviews | |
| Strategy and management | Developing a long-term strategy for investing in generative AI | Model analysis (Sonntag et al., 2024; Chukhlomin, 2024) |
| Implementing an AI strategy in the (product development) process (implementation methodology) | Literature review, Interviews | |
| Defining strategic goals for generative AI | Model analysis (Sonntag et al., 2024) | |
| Assessing the business impact and feasibility of generative AI-based solutions in advance | Interviews | |
| Developing a system for monitoring AI implementation outcomes (metrics for assessing the impact of the solution on the process) | Literature review, Interviews | |
| Competitive analysis in terms of generative AI deployment capabilities | Model analysis (Sonntag et al., 2024) | |
| Budget | Long-term budget planning for development of solutions and infrastructure supporting generative AI | Literature review, Model analysis (Sonntag et al., 2024) |
| Allocating funds for developing employee competencies in generative AI | Literature review, Model analysis (Sonntag et al., 2024) | |
| Funding pilot and innovative projects related to generative AI solutions | Model analysis (Sonntag et al., 2024) | |
| Allocating funds for external generative AI consultations | Reviewers’ suggestion | |
| Prioritizing projects generating high added value through generative AI | Reviewers’ suggestion | |
| Products and services | Supporting/automating product design and manufacturing processes | Literature review |
| Using generative AI for product personalization | Reviewers’ suggestion | |
| Generating product-related ideas (including sentiment analysis/review analysis) | Literature review, Model analysis (Chukhlomin, 2024) | |
| Supporting information reduction (e.g. text summarization) | Literature review, Model analysis (Chukhlomin, 2024) | |
| Supporting and improving concept evaluation processes (requirement analysis, risk assessment, task planning) | Literature review | |
| Reducing product testing time | Literature review | |
| Supporting product marketing (creating advertisements, keywords, promotional videos) | Literature review | |
| Enhancing product recommendation systems | Literature review | |
| Improving the functionality of product databases | Literature review, Model analysis (Chukhlomin, 2024) | |
| Using generative AI solutions as a component in professional applications | Literature review, Model analysis (Chukhlomin, 2024) | |
| Identifying new use cases for generative AI solutions | Model analysis (Sonntag et al., 2024) | |
| Ethics and regulations | Adhering to ethical principles in generative AI design | Literature review |
| Data security and privacy in the product development process | Literature review, Interviews, Model analysis (Das et al., 2023; Sonntag et al., 2024) | |
| Trust in data and explainability | Literature review, Interviews, Model analysis (Das et al., 2023; Sonntag et al., 2024) | |
| Implementing data protection standards and backup mechanisms | Model analysis (Chen et al., 2021) | |
| Counteracting biases and unfairness in generative AI algorithms | Literature review, Model analysis (Das et al., 2023) | |
| Regularly evaluating compliance of generative AI algorithms and tools with legal regulations | Reviewers’ suggestion | |
| Creating audit systems for decisions made by generative AI (accountability for decisions made by generative AI) | Literature review | |
| Increasing awareness of data protection among employees | Model analysis (Chen et al., 2021) | |
| Using cybersecurity-related technologies | Model analysis (Chen et al., 2021) | |
| Maintaining documentation (generative AI use cases, system logs, tests, version control, etc.) to monitor the use of generative AI solutions (inconsistencies, unusual events) | Model analysis (Das et al., 2023) |
| Dimension | Elements | Source |
|---|---|---|
| Technology | Building scalable IT infrastructure supporting | Model analysis ( |
| Integration of generative | Model analysis ( | |
| Automation of generative | Model analysis ( | |
| Use of cloud for data storage and processing for generative | Model analysis ( | |
| Use of tools for managing the lifecycle of generative | Reviewers’ suggestion | |
| Infrastructure supporting the handling of large datasets | Model analysis ( | |
| Implementation of real-time processing technologies | Model analysis ( | |
| Computational power required for deployment and maintenance of generative | Interviews, Model analysis ( | |
| Use of tools (internal or external) based on generative | Model analysis ( | |
| Scalability of generative | Model analysis ( | |
| Data | Building high-quality data sets | Interviews, Model analysis ( |
| Automation of data analysis and processing | Interviews, Model analysis ( | |
| Centralization of data sets (a single data dictionary in the organization) | Interviews, Model analysis ( | |
| Use of advanced tools for data quality assessment | Model analysis ( | |
| Developing a data management strategy | Model analysis ( | |
| Automation of data collection and cleaning processes | Model analysis ( | |
| Identification and integration of data from internal and external sources with current datasets | Model analysis ( | |
| Existence of a standard data model and standard metadata set | Model analysis ( | |
| Using generative | Model analysis ( | |
| People and competencies | Developing awareness and understanding of generative | Literature review, Interviews, Model analysis ( |
| Training teams in programming (including prompt engineering) and data analysis | Model analysis ( | |
| Creating interdisciplinary | Interviews, Model analysis ( | |
| Introduction of external generative | Literature review | |
| Training in project management using generative | Literature review, Model analysis ( | |
| Enhancing knowledge transfer in generative | Model analysis ( | |
| Organization and processes | Integrating | Interviews |
| Automating (product development) processes using generative | Model analysis ( | |
| Using | Interviews, Model analysis ( | |
| Implementing tools supporting | Interviews | |
| Introducing continuous improvement cycles in implementing generative | Model analysis ( | |
| Defined lifecycle management process for software delivering generative | Model analysis ( | |
| Generative AI-based product development guide | Interviews | |
| Strategy and management | Developing a long-term strategy for investing in generative | Model analysis ( |
| Implementing an | Literature review, Interviews | |
| Defining strategic goals for generative | Model analysis ( | |
| Assessing the business impact and feasibility of generative AI-based solutions in advance | Interviews | |
| Developing a system for monitoring | Literature review, Interviews | |
| Competitive analysis in terms of generative | Model analysis ( | |
| Budget | Long-term budget planning for development of solutions and infrastructure supporting generative | Literature review, Model analysis ( |
| Allocating funds for developing employee competencies in generative | Literature review, Model analysis ( | |
| Funding pilot and innovative projects related to generative | Model analysis ( | |
| Allocating funds for external generative | Reviewers’ suggestion | |
| Prioritizing projects generating high added value through generative | Reviewers’ suggestion | |
| Products and services | Supporting/automating product design and manufacturing processes | Literature review |
| Using generative | Reviewers’ suggestion | |
| Generating product-related ideas (including sentiment analysis/review analysis) | Literature review, Model analysis ( | |
| Supporting information reduction (e.g. text summarization) | Literature review, Model analysis ( | |
| Supporting and improving concept evaluation processes (requirement analysis, risk assessment, task planning) | Literature review | |
| Reducing product testing time | Literature review | |
| Supporting product marketing (creating advertisements, keywords, promotional videos) | Literature review | |
| Enhancing product recommendation systems | Literature review | |
| Improving the functionality of product databases | Literature review, Model analysis ( | |
| Using generative | Literature review, Model analysis ( | |
| Identifying new use cases for generative | Model analysis ( | |
| Ethics and regulations | Adhering to ethical principles in generative | Literature review |
| Data security and privacy in the product development process | Literature review, Interviews, Model analysis ( | |
| Trust in data and explainability | Literature review, Interviews, Model analysis ( | |
| Implementing data protection standards and backup mechanisms | Model analysis ( | |
| Counteracting biases and unfairness in generative | Literature review, Model analysis ( | |
| Regularly evaluating compliance of generative | Reviewers’ suggestion | |
| Creating audit systems for decisions made by generative | Literature review | |
| Increasing awareness of data protection among employees | Model analysis ( | |
| Using cybersecurity-related technologies | Model analysis ( | |
| Maintaining documentation (generative | Model analysis ( |
The next step was discussion and validation of the model. The OLIMP model was reviewed by nine experts specializing in product management and generative AI solutions. The experts confirmed that the model was well-structured and did not require the addition of new dimensions or the removal of existing practices. However, they suggested incorporating several additional practices to enhance its comprehensiveness: “Utilization of generative AI lifecycle management tools” within the Technology dimension and two practices to the Budget dimension: “Allocation of funds for external generative AI consultancy” and “Prioritization of projects generating high added value through generative AI”. A final suggestion was the inclusion of the practice “Application of generative AI for product personalization” in the Products and Services dimension. Experts highlighted that generative AI offers extensive opportunities for product customization, warranting its recognition as a separate practice within the model.
The authors incorporated these four additional practices into the OLIMP model as per expert recommendations. Following these revisions, discussions were conducted with five additional experts. The only further recommendation concerned the inclusion of the practice “Regular assessment of generative AI algorithms and tools for legal compliance” in the Ethics and Regulations dimension. This practice was subsequently integrated into the model. Table 2 and the OLIMP model questionnaire, available in Supplementary Material 2, have been updated to reflect the expert-suggested practices. The subsequent step was to extend the model with a prescriptive part.
3.2.2.3 Generative AI maturity model – prescriptive part
As previously mentioned, Röglinger et al. (2012) identified three key characteristics of prescriptive maturity models. Beyond outlining best practices for each maturity level, it is crucial to present alternative pathways for progressing to the next level while considering cost-benefit trade-offs and providing a clear implementation methodology. A particularly compelling avenue for exploration is the integration of generative AI solutions. Given prior research on the use of generative technologies in innovation teams and the limitations associated with ChatGPT, a custom tool was developed based on LLMs. This tool is designed to propose alternative pathways for organizations to achieve higher maturity levels while factoring in cost-benefit considerations. The tool operates by leveraging questionnaire responses and utilizing 3 AI agents, based on OpenAI, Anthropic and Google’s LLMs. Each agent evaluates the organization’s current state based on questionnaire responses and then evaluator model, based on Gemini 2.5 Pro, generates tailored recommendations aimed at enhancing its maturity level. The tool is available in the GitHub repository at the link here: https://github.com/vitkovski/CLIMB2OLIMP.
The next step was discussion and validation of the developed tool. The overall evaluation of the prepared report was very positive, with experts noting it matched the quality of major auditing firms like EY. Key expert feedback included: (1) AI can now provide organizational improvement recommendations that were previously crowdsourced from employees; (2) the report suits large companies with dedicated teams and long-term planning but may be too detailed for smaller organizations, requiring verification through additional research; (3) trackable transformation metrics and an executive summary highlighting initial focus areas would be beneficial; (4) organizations should be offered a limited selection of 2–3 priority improvement areas to avoid decision paralysis; and (5) organizational context varies significantly, so companies may need to prioritize different areas based on available resources and management support, though the report serves as a comprehensive roadmap for future initiatives.
Based on this feedback, summaries and metrics were incorporated into the reports. The current report scope was maintained pending research on smaller organizations. The comment regarding proposing organizations choose three key areas in which they want to achieve a higher level of maturity was also incorporated. Four organizations were subsequently examined, with final reports available in Supplementary Materials 3-6.
3.2.2.4 Procedure – CLIMB2-OLIMP
Considering previous critiques of models, the authors proposed the CLIMB2-OLIMP model, which integrates the extended CLIMB model (CLIMB2) with the authors’ own OLIMP model. The combination of these two models offers a more comprehensive approach to assessing organizational maturity in leveraging generative AI solutions in the NPD process.
The maturity assessment procedure consists of several steps. The first step is measuring the maturity of the NPD process. Achieving the second maturity level in this assessment is a prerequisite for initiating the evaluation of organizational maturity in utilizing Generative AI solutions within the product development process. The authors arbitrarily selected the second level, assuming that an organization should have a basic structure for the product development process defined at a foundational level before considering the use of generative AI solutions in this process and measuring its maturity but will accept cases where the organization lacks a couple practices on the second level and will still do the OLIMP maturity assessment, which is the second step. The final step involves organizations determining their target maturity level for each model dimension and utilizing the prescriptive part of the model to propose various alternatives for transitioning from their current level in each dimension to the target level, considering a cost-benefit analysis. The CLIMB2-OLIMP model is shown in Figure 3.
The three-panel framework diagram titled “CLIMB2-OLIMP” presents a staged assessment and decision process arranged vertically and numbered “1”, “2”, and “3” along the left margin. Panel 1: “CLIMB2 measurement.” Eight rounded boxes appear in two rows. The top row reads: “Activities and flow”, “Decision-making”, “K M process”, and “KM techniques”. The bottom row reads: “Computerization and software”, “Methods”, “Roles and collaboration”, and “Training”. A large right-pointing arrow leads to a radar chart labeled with the same eight dimensions: “Activities and flow”, “Decision-making”, “K M Process”, “K M Techniques”, “Computerization and software”, “Methods”, “Roles and collaboration”, and “Training”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays one plotted line. Panel 2: “If CLIMB2 level is greater than 2, then OLIMP measurement.” Eight rounded boxes appear in two rows. The top row reads: “Budget and Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”. The bottom row reads: “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. A right-pointing arrow leads to a radar chart labeled with the same eight OLIMP dimensions: “Budget and Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”, “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays one plotted line. Panel 3: “OLIMP prescriptive part.” On the left is a radar chart showing the OLIMP dimensions: “Budget & Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”, “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays two plotted lines: a solid and a dashed line. A right-pointing arrow leads to a decision structure. At the top is a box labeled “Level X”, connected downward to multiple boxes labeled “Alternative 1”, “Alternative 2”, and “Alternative N”. These feed into a lower box labeled “Level Y”. To the right, a plus sign connects the structure to a vertical box labeled “Cost or benefit analysis”.CLIMB2-OLIMP model. Source: Authors’ own work
The three-panel framework diagram titled “CLIMB2-OLIMP” presents a staged assessment and decision process arranged vertically and numbered “1”, “2”, and “3” along the left margin. Panel 1: “CLIMB2 measurement.” Eight rounded boxes appear in two rows. The top row reads: “Activities and flow”, “Decision-making”, “K M process”, and “KM techniques”. The bottom row reads: “Computerization and software”, “Methods”, “Roles and collaboration”, and “Training”. A large right-pointing arrow leads to a radar chart labeled with the same eight dimensions: “Activities and flow”, “Decision-making”, “K M Process”, “K M Techniques”, “Computerization and software”, “Methods”, “Roles and collaboration”, and “Training”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays one plotted line. Panel 2: “If CLIMB2 level is greater than 2, then OLIMP measurement.” Eight rounded boxes appear in two rows. The top row reads: “Budget and Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”. The bottom row reads: “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. A right-pointing arrow leads to a radar chart labeled with the same eight OLIMP dimensions: “Budget and Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”, “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays one plotted line. Panel 3: “OLIMP prescriptive part.” On the left is a radar chart showing the OLIMP dimensions: “Budget & Investment”, “Strategy and Management”, “Organization and Processes”, “Ethics and Regulations”, “Data”, “Technology and Infrastructure”, “Products and Services”, and “People and Competencies”. The radar chart uses a radial scale from 0 to 5 with concentric rings marked at intervals of 1 unit and displays two plotted lines: a solid and a dashed line. A right-pointing arrow leads to a decision structure. At the top is a box labeled “Level X”, connected downward to multiple boxes labeled “Alternative 1”, “Alternative 2”, and “Alternative N”. These feed into a lower box labeled “Level Y”. To the right, a plus sign connects the structure to a vertical box labeled “Cost or benefit analysis”.CLIMB2-OLIMP model. Source: Authors’ own work
In response to previous critiques, the developed model is theory-based, incorporates significant factors identified in interviews and literature, includes maturity levels, and features both prescriptive and financial components. According to the authors, the model is comprehensive due to its combined approach, examining the NPD process from two different perspectives, and is also universal – the authors believe the model can be applied to assess the use of generative AI solutions in NPD processes across various organizations and industries. Additionally, the model is designed to be straightforward and comes with documentation for reuse – it consists of two questionnaires and is significantly less complex than previously discussed models, such as that by Chen et al. (2021) or the CMMI model, which literature criticizes for requiring substantial resources to assess maturity. The remaining two aspects – empirical validation of the model and its usability – will be discussed by the authors in subsequent sections of the article.
3.2.3 Transfer and evaluation concept, media implementation, and model evaluation
As mentioned earlier, in addition to the prepared questionnaires, the authors shared a GitHub repository with the model, including the tool for the prescriptive part. The link can be found here: https://github.com/vitkovski/CLIMB2OLIMP. To evaluate the model, case studies of four organizations were conducted within a manufacturing sector. The case selection process was driven by a strategy of purposeful sampling. The specific criteria used for selecting the four organizations were that all organizations must be firmly situated within the manufacturing sector and deal with the challenges of NPD and industrial digital transformation. Additionally, the inclusion of firms with varying levels of scale and ownership structures to test the model’s robustness was taken into account. Organization A is a leading American manufacturing enterprise with global presence in over 30 countries, specializing in industrial electronic equipment and employing over 1,000 personnel worldwide. Three experts (two product managers and one technology implementation engineer) completed maturity assessments, with results verified by the Chief Technology Officer. OrgB is a Polish engineering firm with nearly 40 years of experience in automation solutions and digital transformation for industry, employing over 200 personnel. The company offers comprehensive products, services, consulting, and training. The Chief Executive Officer completed the questionnaire and participated in an interview. Organization C is a Polish technology manufacturing company and leading electronic equipment producer in Poland, operating for over 30 years with over 150 employees. A strategic technology consultant completed the questionnaire and participated in a semi-structured interview. Organization D is the Polish division of a German industrial electronics manufacturer, with local production facilities employing over 1,000 personnel domestically and 15,000 globally. A product manager completed the questionnaire and evaluated the assessment tool. Respondents’ answers were recorded in the questionnaires provided in Supplementary Materials 1 and 2, with Organization ‘X’ labeled as “OrgX”. The authors acknowledge that the geographic concentration limits the external statistical validity of the findings and that future validations could include more diverse geographies to enhance it.
4. Findings
The literature review reveals that AI maturity models are relatively new, emerging around 2018–2019. Early models from Microsoft (four-level, 2018), Ovum (four-level), and Gartner (five-level, 2019) approached AI maturity from various perspectives, often relying on industry literature. Some authors criticized these organizational models for lacking scientific grounding, insufficient documentation, and non-replicable procedures (Alsheiabni et al., 2019; Schuster et al., 2021). Only Yams et al. (2021) explicitly chose a top-down approach based on De Bruin et al. (2005), arguing it’s appropriate for new fields where maturity concepts remain unclear. Seven of thirteen models partially used the DSR approach, drawing on Hevner et al. (2004) or Becker et al. (2009), with authors employing various iterative approaches for model building.
Schuster et al. (2021) identified AIMM 1, AIMM 6, and AIMM 7 as the most holistic and scientifically grounded models. However, AIMM 1 relies on outdated definitions (three pre-1995 publications), AIMM 6 uses organizational models, and AIMM 7 lacks construction methodology. While criticizing earlier models for missing cost-benefit analyses, prescriptiveness, and practical testing, Schuster et al. failed to provide these elements in their own work.
Only Chen et al. (2021) directly addresses prescriptiveness per Röglinger et al. (2012), though their tool only partially meets prescriptive objectives, lacking detailed improvement specifications and decision calculus for advancing maturity levels. Sonntag et al. (2024), while not explicitly addressing prescriptiveness, could be considered partially prescriptive through their weighted dimension approach, indicating priority elements but lacking cost-benefit analysis and complete improvement procedures.
Most models use five maturity levels, following the CMMI standard. Only Ellefsen et al. (2019) and Das et al. (2023) use four levels. Models contain 2–8 elements, averaging 4–5 elements (mode: 4). The most frequently appearing elements are:
Technology (in 9 models; also referred to as Technological means, tools and platforms, or AI functions)
Strategy (in 8 models; also referred to as Strategy and management, or Learning designs and strategies or Goals)
Data (in 7 models; also referred to as Information or Data structure)
Organization (in 7 models; sometimes as Organization and processes)
People/Culture/Competencies (in 6 models; sometimes as People and competencies, Culture and competencies, Required abilities, skills and competencies, or Mindsets)
Ethics (in 3 models; also, as Ethics and regulations or Trustworthiness)
AI Use Cases (in 4 models as Operations, Products and services, Essential tasks and use cases, or Activities)
Identifying recurring model elements is challenging due to varying specification approaches. Four examples illustrate this diversity:
Yablonsky (2019): Four elements covering who produces insights, decides, and acts, plus advanced analytics including data work and infrastructure.
Lichtenthaler (2020): Three elements (AI types, human intelligence, meta-intelligence) with implicit components like technology building and AI strategy.
Chen et al. (2021): Most complex model with Industry and AI dimensions, 35 second-level indicators covering R&D, production, data quality, and cloud storage – essentially a hybrid combining multiple domains.
Das et al. (2023): Ethics-focused with four dimensions: Auditability, Explainability, Fairness, and Safety.
Second, the names of elements do not always reflect what actually resides within a given element. For instance, the Organization element in the model by Yams et al. (2021) also includes human resources and skills, which in many models are listed as separate items, or ethical components, which are sometimes encompassed within the Data or Organization dimensions. Other elements that have not been listed above include Budget (sometimes included in the Strategy dimension), Privacy (which in the model by Schuster et al. (2021) is a separate dimension), Ecosystems (focused on collaboration and communication), Boundaries (which a company can push using AI solutions), or the External environment and expectations concerning graduates’ skills (a strictly domain-specific dimension – educational, focusing on students’ skills).
As for the method of evaluating an organization based on a constructed maturity model, only four articles offer such a method. In each case, this takes the form of a questionnaire with a list of questions or a table where specific criteria are marked if met. The questionnaires are prepared for organizations as a form of self-assessment, or interviews are conducted to obtain more detailed information. Only three of the four articles that developed questionnaires (and, consequently, only three of the thirteen articles overall) describe empirical research. In one case, this involved four companies from the logistics sector (AIMM 3), in the second, a single company from the insurance sector (AIMM 7), and in the third, three divisions within a single company from the manufacturing sector (AIMM 12). All studies indicate that although the companies (or divisions) performed better in some dimensions than in others (demonstrating higher maturity levels in certain dimensions), when viewed across all dimensions, all surveyed organizations would still be at the first level.
Using CLIMB2 and OLIMP, we assessed four manufacturing organizations (A–D). Across all cases, NPD process maturity exceeded GenAI maturity; the grand means indicate a sizeable gap (≈2.86 vs. 1.93). The rank order was consistent across both models, with OrgB scoring highest, which points to shared enablers such as culture, governance, and competencies. Notably, several firms demonstrated advanced knowledge use despite lacking formal knowledge-management plans, suggesting that full formalization is not a strict prerequisite for effective practice.
Experts evaluated the LLM-enabled prescriptive reports positively (mean 7.25/10). Larger firms tended to favor multi-year transformation programs, whereas mid-sized firms progressed more effectively through stepwise starts focused on two to three priority areas – most often People and Competencies and Organization and Processes. Respondents highlighted the usefulness of budget-anchored trade-offs, trackable KPIs, and concise executive summaries for decision support. Finally, we observed that purely hierarchical scoring can conceal strong practices when a single low-scoring item binds; augmenting thresholds with weighted averages improved diagnostic clarity and aligned more closely with observed performance.
5. Discussion
This study set out to understand how the maturity of product-development routines relates to the maturity of generative-AI adoption in manufacturing, and how a prescriptive maturity model can support decision-making. Across four cases, product-development maturity consistently exceeded AI-adoption maturity. This pattern is consistent with manufacturing and product development research that emphasizes disciplined processes and governance as foundations for technology leverage (Cooper, 2023).
Compared to prior work, CLIMB2-OLIMP represents a significant shift from the dominant descriptive orientation of existing maturity models. As Röglinger et al. (2012) and Wendler (2012) have noted, most models serve primarily to assess an organization’s status quo, offering limited guidance for advancement. While some authors, such as Chen et al. (2021), have proposed models with prescriptive components, these are often limited to general improvement directions and lack mechanisms for contextualized decision-making. In contrast, CLIMB2-OLIMP embeds a prescriptive tool based on organizational self-assessment and tailored AI-generated pathways. This approach not only personalizes the transformation journey but also introduces a scalable mechanism for evaluating cost–benefit trade-offs, described as a gap in the literature (Albliwi et al., 2014).
The model also contrasts with existing AI maturity frameworks in its dual-maturity structure, which conditions GenAI readiness on a minimum threshold of NPD maturity. This design draws on product development research that highlights the foundational role of repeatable, disciplined processes for enabling innovation (Cooper, 2023). While most AI maturity models treat dimensions such as data, technology, or people as parallel constructs (e.g. Fukas and Thomas, 2023; Schuster et al., 2021), CLIMB2-OLIMP posits a sequential logic: GenAI should not be introduced into unstable or ad hoc development environments. This theoretical proposition – that some capabilities require others to be in place first – has not been systematically explored in prior AI maturity literature and offers a new lens for modeling capability evolution.
Additionally, the model responds directly to methodological and theoretical critiques leveled against many existing frameworks. Albliwi et al. (2014) criticized the proliferation of models lacking scientific rigor, theoretical grounding, and empirical validation. Röglinger et al. (2012) pointed to the lack of hierarchical structure and improvement guidance in many maturity models. CLIMB2-OLIMP incorporates these critiques into its design logic: it is theory-informed (drawing on DSR principles), empirically evaluated across four manufacturing firms, and structured hierarchically across defined maturity levels.
Managerially, this implies re-sequencing and portfolio thinking: stage AI initiatives after core routines are at least defined and repeatable. The value of this sequence was clearly demonstrated in the case findings via the prescriptive tool reports. These reports provided the case companies with quantified transformation pathways, allowing managers to visualize how closing specific NPD gaps would directly reduce the risk of GenAI implementation failure. By presenting alternative pathways based on resource constraints, the tool reports moved the discussion from abstract AI enthusiasm to concrete, KPI-linked investment strategies.
Beyond firm-level efficiency, this structured approach yields broader societal benefits by facilitating responsible and sustainable manufacturing innovation. By embedding ethical guardrails and governance maturity into the prescriptive pathways, CLIMB2-OLIMP ensures that GenAI adoption is not merely a race for speed, but a transition toward “Ethics by Design”. This disciplined integration prevents the automation of wasteful or biased legacy processes, promoting resource efficiency and long-term environmental sustainability. In this sense, a mature NPD baseline serves as the necessary foundation for Responsible AI, ensuring that technological leaps contribute to a more resilient and ethically grounded industrial ecosystem.
The current application and validation of the CLIMB2-OLIMP model are rooted in manufacturing contexts. However, the dual-maturity structure is theoretically portable to service and software-native sectors. As argued by Morgan and Liker (2006) regarding the Toyota Product Development System, the principles of disciplined process and people-centric systems are sector-agnostic. In a service or software context, the “product” is the flow of information and knowledge; therefore, the CLIMB2 baseline can be effectively substituted with Service Design and Delivery (SD&D) or the Software Development Lifecycle (SDLC). The GenAI-centric OLIMP component remains largely consistent, as the underlying technology is sector-agnostic. However, the application focus shifts from optimizing physical supply chains to optimizing the Digital Value Chain, ensuring that GenAI automates a robust, rather than a broken, service architecture.
6. Conclusions
This study advances CLIMB2-OLIMP, a dual maturity model that conditions GenAI assessment on a minimum NPD baseline and embeds a prescriptive, budget-aware tool. Across four manufacturing organizations, NPD discipline consistently outpaced GenAI adoption, and the introduction of tailored pathways together with hybrid scoring improved managerial actionability.
Recommendations for practicing managers, derived from the case evidence, suggest a clear sequence: organizations may benefit from securing a defined CLIMB2 baseline first, before scaling GenAI. In parallel, organizations should consider concentrating on people and processes by instituting prompting standards, lifecycle controls, and change-management routines tied to NPD gates, as observed in high-performing cases. Effective decision-making, we suggest, can be supported by an explicit tool that compares two or three alternative paths for each target level – each with costs, benefits, and risks – to select the highest-utility option. Governance may track progress more effectively when hybrid scoring is used, combining threshold checks for risk with weighted averages for momentum. Transformation cadence appeared to be right-sized to context in our cases: often manifesting as multi-year programs in large firms and quarter-by-quarter compounding wins in mid-sized firms. Finally, we recommend that ethics be treated as a capability system: pairing compliance audits with concrete investments in data quality, model monitoring, and incident response.
The study has limits. Findings derive from four manufacturing organizations and may not generalize to services or software-native settings; budgets in the prescriptive outputs are indicative and require local calibration; and broader psychometric testing of CLIMB2 and OLIMP (reliability and validity) remains to be completed. Future research should quantify the NPD baseline thresholds that support the sequencing proposition across industries, evaluate the causal impact of the prescriptive tool on time-to-value, refine hybrid scoring weights and relate them to innovation outcomes, and track how institutional pressures shape the observed ethics – capability gap over time.
The supplementary material for this article can be found online.

