Skip to Main Content
Purpose

This study aims to investigate how different layers of artificial intelligence (AI) technologies jointly shape urban innovation, addressing the puzzle of why cities with high AI adoption often display uneven innovation outcomes. Rather than treating AI as a single technology, the study conceptualises AI as a three-tier capability structure consisting of foundational, core and general-purpose technologies, and examines whether urban innovation benefits more from isolated technological investments or from coordinated development across these layers in an emerging-economy context.

Design/methodology/approach

Using patent-based indicators, this study constructs city-level measures of foundational, core and general-purpose AI technologies for a balanced panel of 120 Chinese cities from 2010 to 2021. Fixed-effects models with interaction terms are used to estimate both direct and complementary effects across AI layers. An instrumental-variable strategy based on historical communication infrastructure is further applied to address potential endogeneity concerns, and heterogeneity analyses are conducted across different urban contexts.

Findings

The results show that the innovation effects of AI are layered and uneven. Foundational and core AI technologies are more strongly associated with urban innovation capacity, whereas the contribution of general-purpose AI technologies is more contingent on local development conditions. The findings further suggest that complementarities across AI layers are conditional and that sustainable urban innovation depends on both upstream capability accumulation and downstream application diffusion.

Research limitations/implications

The analysis is based on city-level data from 2010 to 2021 and therefore does not fully capture the post-2022 diffusion of generative AI. Future research could extend the framework to examine whether large-model technologies reshape the relationships among different AI layers and urban innovation outcomes.

Practical implications

The study suggests that policymakers should adopt differentiated AI development strategies, balancing application expansion with sustained investment in foundational infrastructure, core technologies and local innovation capabilities.

Originality/value

This study advances innovation research by reconceptualising AI as a layered technology stack rather than a monolithic general-purpose technology. It reveals a hierarchical complementarity mechanism through which AI capabilities translate into urban innovation, offering new insights into why application-led AI strategies often fail. The findings provide actionable implications for urban and regional innovation policy in emerging economies by highlighting the importance of balanced, context-sensitive AI capability development.

Artificial intelligence (AI) has become a central enabling technology in contemporary smart-city initiatives and urban governance, reshaping how cities innovate, allocate resources and pursue sustainable development (Mei et al., 2024). As core spatial units of economic activity and knowledge creation, cities increasingly rely on AI to enhance productivity, improve public services and support long-term innovation performance (Son et al., 2023; Yue et al., 2025). In China, national strategies such as the New Generation Artificial Intelligence Development Plan and the 14th Five-Year Plan for the Digital Economy have positioned AI as a key driver of the transition from factor-driven growth to sustainable, innovation-oriented urban development (Song et al., 2025; Wu et al., 2020). Leading cities including Hangzhou, Shenzhen and Chengdu illustrate how AI is being deployed in areas such as intelligent transportation, health-care analytics and digital governance, reinforcing the role of AI in shaping urban innovation systems.

Despite the rapid diffusion of AI-related applications, urban innovation outcomes remain highly uneven across cities (Mei et al., 2024). Many cities display strong downstream adoption of AI applications while lacking upstream capabilities in computing infrastructure, semiconductor hardware and algorithmic development. This imbalance has been described as a pattern of “upstream gaps and downstream overheating” in China’s AI innovation chain (Yu et al., 2022). Such configurations often lead to fragmented pilot projects and short-lived innovation gains, leaving cities dependent on external technologies and undermining their long-term innovation resilience. By contrast, cities with more balanced AI capabilities across infrastructure, algorithms and applications tend to achieve sustained and cumulative innovation trajectories. This paradox challenges the notion that the expansion of AI applications alone is sufficient to generate sustainable urban innovation.

From a practical governance perspective, it is particularly important for urban policymakers to understand the distinction between AI infrastructure, algorithms and applications. In many Chinese cities, the development of AI is often evaluated based on visible downstream applications such as smart government, intelligent transportation, AI-enabled manufacturing and digital public services (Wei et al., 2025). However, if these applications are not supported by adequate upstream capacities, such as computing infrastructure, semiconductor support, data resources, algorithmic platforms and local research and development (R&D) organisations, AI deployment may become dependent on the procurement of external technology or project-based system integration (Zhou et al., 2025). In such cases, cities may achieve rapid application expansion yet fail to translate AI adoption into sustained local innovation, knowledge accumulation or industrial upgrading. Conversely, cities with a balanced configuration across foundational, core and general-purpose AI layers are more likely to convert AI investment into endogenous innovation, entrepreneurial dynamism and broader knowledge spillovers. This is precisely why a layered perspective is not only theoretically meaningful but also practically necessary: without distinguishing between upstream and downstream AI capabilities, urban policy may reward short-term application visibility too much while not investing enough in the structural capacities that sustain long-term innovation resilience.

This pattern can be observed in various forms across Chinese cities. Shenzhen and Hangzhou, for example, more closely reflect relatively integrated AI development paths, where industrial digitalisation, platform capabilities, algorithmic applications and innovation ecosystems act as a mutually reinforcing system. Chengdu also demonstrates a comparatively coordinated trajectory, supported by research institutions and the electronics and information industry base, as well as the diffusion of AI-enabled applications. By contrast, some cities have expanded AI-related application scenarios or digital infrastructure more rapidly than their local upstream technological capabilities have matured. Guiyang, for instance, has been prominent in data-centre construction and big-data-oriented infrastructure deployment, yet its endogenous strength in high-end algorithmic ecosystems and advanced AI innovation capabilities remains comparatively less developed. These examples are intended as illustrative city-level observations rather than full case studies. Nevertheless, they highlight the policy relevance of the “upstream gap versus downstream overheating” phenomenon: cities may appear highly active in AI adoption, but without stronger foundational and core capacities, such expansion may not generate equally strong long-term innovation effects.

A growing literature has highlighted the importance of AI and digital technologies for smart and sustainable urban development (Son et al., 2023; Yigitcanlar et al., 2020; Yue et al., 2025). However, most existing studies implicitly treat AI as a homogeneous or monolithic technology, or focus on isolated components such as digital infrastructure, algorithms or application scenarios (Chen et al., 2025; Ullah et al., 2025). While these studies provide valuable insights, they rarely examine how different layers of AI technology interact along an innovation chain or why similar levels of AI adoption can lead to markedly different innovation outcomes across cities. As a result, current research offers limited guidance on how AI-enabled smart-city strategies can be designed to support long-term, system-wide and sustainable innovation.

To address this gap, this study reconceptualises AI not as a single general-purpose technology, but as a layered technology stack comprising three interrelated tiers: foundational technologies (FTs), core technologies (CTs) and general-purpose technologies (GPTs). FTs include computing infrastructure, semiconductor hardware and data platforms that provide the basic operational capacity for AI. CTs consist of algorithms, models and software frameworks that translate infrastructural resources into reusable knowledge modules. GPTs refer to cross-sectoral AI applications embedded in urban domains such as transportation, health care and manufacturing. This layered perspective highlights that weaknesses at any tier may constrain the innovative returns of the entire system, and that complementarities across tiers are critical for sustainable urban innovation.

Drawing on theories of technological hierarchy and complementarities (Brynjolfsson and Milgrom, 2013; Milgrom and Roberts, 1990), this study argues that AI-driven urban innovation depends not only on the presence of individual technologies but also on their coordinated development and hierarchical alignment. In particular, complementarities between upstream infrastructure and midstream algorithmic capabilities may condition whether downstream AI applications generate substantive innovation gains or remain isolated demonstrations. Moreover, only when all three tiers are jointly developed can cities realise system-wide, self-reinforcing innovation dynamics that support long-term sustainability and resilience.

This study empirically examines a balanced panel of 120 Chinese cities from 2010 to 2021. This period was characterised by the rapid expansion of digital infrastructure and experimentation with smart cities. During this time, digital infrastructure expanded rapidly, AI-related technological capabilities matured and the layered structure of AI began to emerge across cities. As this period from 2010 to 2021 captured a critical stage in China’s smart city transformation, it is particularly well-suited to identifying the structural foundations through which foundational, core and application-oriented AI technologies shape urban innovation. Using patent-based measures to capture FTs, CTs and GPTs, we estimate two-way fixed-effects models with interaction terms and implement an instrumental-variable strategy based on historical communication infrastructure to address endogeneity concerns. This design allows us to identify both the direct effects of each AI tier on urban innovation and the complementary effects arising from their interaction, while accounting for place-specific conditions such as coastal location, policy resources, transportation accessibility and city size.

This study makes three main contributions. Firstly, it advances the smart and sustainable cities literature by providing a structured, empirically operationalised framework that conceptualises AI as a layered technology stack rather than a single input. Secondly, it contributes to research on AI and innovation by demonstrating that complementarities among AI layers are hierarchical rather than purely pairwise, with robust synergies emerging only when foundational and CTs are sufficiently developed. Thirdly, it offers policy-relevant insights into why application-oriented smart-city strategies may fail to deliver sustained innovation, highlighting the importance of balanced, place-sensitive investment across the entire AI technology chain.

The remainder of this paper is organised as follows. Section 2 develops the theoretical framework and hypotheses. Section 3 describes the data and empirical strategy. Section 4 presents the results and robustness checks. Finally, Section 5 discusses the implications for sustainable urban innovation and smart-city policy.

Drawing on the extant literature (Wang et al., 2024; Zhu et al., 2020), industrial technological capabilities are commonly understood as hierarchically structured, comprising FTs, CTs and GPTs. FTs constitute the upstream basis of industrial development by providing essential scientific and engineering principles upon which more advanced applications are built (Daimi et al., 2023). These technologies are typically cross-cutting in nature and can be deployed across multiple industries, thereby supporting a broad range of productive activities. Building on this foundation, CTs translate foundational principles into industry-specific or cross-industry solutions, often through the development of functional systems, machinery and operational processes (Miller, 2020; Zhu et al., 2020). Located downstream, GPTs reflect application-oriented innovations that are tailored to improve efficiency, product quality and commercialisation outcomes at the firm level, usually through substantial R&D investments based on CTs.

In this study, we extend this multi-tier hierarchical perspective to the AI context and the urban scale, thereby constructing a three-tier AI technology framework suited to the analysis of urban innovation. Drawing on industrial chain theory, we conceptualise city-level AI capabilities as an interconnected system comprising FTs, CTs and GPTs. At the upstream level, AI-related FTs include semiconductor hardware, computing power, data centres and digital infrastructure platforms, which underpin the scalability, reliability and security of AI deployment. At the midstream level, CTs encompass algorithmic innovations, machine-learning models and data-processing frameworks that enable functions such as pattern recognition, natural language processing and optimisation. At the downstream level, GPTs refer to versatile AI applications embedded in concrete urban domains, including smart transportation, intelligent health care, digital public services and AI-enabled manufacturing. Through these applications, AI-related technological advances are translated into tangible economic and societal value. This layered perspective implies that urban AI capability should not be viewed as a single stock of technology, but rather as an interdependent innovation chain in which deficiencies at any tier may constrain overall innovation performance.

To translate this conceptualisation into an analytical structure, we develop a conceptual framework (Figure 1) that links the three-tier AI architecture to urban innovation capability. The framework captures both the direct effects of each AI tier and the interaction mechanisms arising from their joint deployment. Specifically, we posit that foundational, core and general-purpose AI technologies each contribute to urban innovation, while their full innovation potential is realised only when technological complementarities are present. For instance, complementarities between FTs and CTs reflect the extent to which algorithmic advancements depend on robust digital infrastructure, whereas interaction effects between upstream and downstream tiers (FTs × GPTs and CTs × GPTs) indicate whether AI applications are constrained or enabled by the maturity of underlying technologies. Moreover, the framework incorporates a three-way interaction mechanism (FTs × CTs × GPTs), highlighting that system-wide and sustained innovation effects are most likely to emerge when the entire AI technology chain is coherently developed and aligned.

Figure 1.
A flowchart links direct effects, interaction mechanisms, and heterogeneity effects to urban innovation capability.The central box reads Urban Innovation Capability. A left box, labelled Direct Effects, lists F T s points to Urban Innovation increase, C T s points to Urban Innovation increase, and G P T s points to Urban Innovation increase. A horizontal arrow from this box points to the centre and is labelled Direct Effect Path. A right box, labelled Interaction Mechanisms, lists F T s times C T s points to Complementary Effect, F T s times G P T s points to Conditional depends on C T s, C T s times G P T s points to Conditional depends on F T s, and F T s times C T s times G P T s points to Synergistic Effect. A horizontal arrow from this box points to the centre and is labelled Interaction Effect Path. A top box, labelled Heterogeneity Effects, states that F T s and C T s are stronger in coastal cities, while G P T s are weaker inland. It states that C T s and G P T s are stronger in policy-rich cities, while F T s are weaker. It states that F T s and G P T s are weaker in high-accessibility cities, while C T s are unchanged. It states that F T s are stronger in small slash medium cities, while C T s and G P T s are insignificant. A vertical arrow points from the top box to the centre and is labelled Heterogeneity Influence.

Research framework

Figure 1.
A flowchart links direct effects, interaction mechanisms, and heterogeneity effects to urban innovation capability.The central box reads Urban Innovation Capability. A left box, labelled Direct Effects, lists F T s points to Urban Innovation increase, C T s points to Urban Innovation increase, and G P T s points to Urban Innovation increase. A horizontal arrow from this box points to the centre and is labelled Direct Effect Path. A right box, labelled Interaction Mechanisms, lists F T s times C T s points to Complementary Effect, F T s times G P T s points to Conditional depends on C T s, C T s times G P T s points to Conditional depends on F T s, and F T s times C T s times G P T s points to Synergistic Effect. A horizontal arrow from this box points to the centre and is labelled Interaction Effect Path. A top box, labelled Heterogeneity Effects, states that F T s and C T s are stronger in coastal cities, while G P T s are weaker inland. It states that C T s and G P T s are stronger in policy-rich cities, while F T s are weaker. It states that F T s and G P T s are weaker in high-accessibility cities, while C T s are unchanged. It states that F T s are stronger in small slash medium cities, while C T s and G P T s are insignificant. A vertical arrow points from the top box to the centre and is labelled Heterogeneity Influence.

Research framework

Close modal

Based on this framework, two main analytical pathways guide our hypotheses. The first pathway focuses on the direct effects of each AI technology tier on urban innovation, capturing their differentiated roles within the urban innovation system. The second pathway examines interaction mechanisms among FTs, CTs and GPTs (H2a–H2d), reflecting complementary and synergistic relationships along the AI innovation chain. In addition, we recognise that these effects are likely to be shaped by city-specific characteristics, such as coastal location, policy resources, transport accessibility and city size. Accordingly, we further explore heterogeneity in the impacts of three-tier AI technologies across different urban contexts. In this regard, this framework provides a coherent logic model for understanding how multi-tier AI technologies jointly shape urban innovation trajectories through direct, interactive and context-dependent mechanisms.

A layered perspective on AI is important not only for theoretical explanation but also for urban innovation governance. For city governments, the key challenge is not simply whether AI is present, but whether different layers of AI capability are built and coordinated in a complementary way. FTs provide the infrastructural and technical base for digital computation and connectivity. CTs shape the depth of local algorithmic and platform capabilities, while GPTs determine the breadth of downstream deployment across sectors and application scenarios. When urban AI strategies focus predominantly on application rollout while neglecting upstream technological capacity-building, cities may experience a form of “downstream overheating” characterised by rapid adoption but limited endogenous innovation upgrading. Conversely, when foundational, core and application layers evolve in a more aligned manner, AI is more likely to strengthen knowledge production, industrial transformation and sustainable urban innovation performance. The layered framework proposed in this study therefore serves both as an analytical device for understanding innovation mechanisms and as a practical guide for diagnosing city-specific development bottlenecks.

2.1.1 Foundational technologies and sustainable urban innovation.

FTs, encompassing computing hardware, semiconductor chips, data infrastructure and network systems, constitute the technological backbone of urban AI innovation ecosystems (Bibri et al., 2024). By providing the essential operational and material foundations, FTs shape the scale, speed and scope of AI-related innovation activities at the urban level. In particular, robust foundational infrastructure enables large-scale data collection, storage and processing, thereby supporting experimentation, model training and deployment across a wide range of urban domains (Li et al., 2024; Nie et al., 2023).

From an innovation systems perspective, a mature FTs layer reduces the costs and technical complexities associated with data processing and algorithm development, allowing cities to accommodate large-scale AI experimentation and industrial diffusion (Huang et al., 2025). By broadening the available resource base and lowering entry thresholds, especially for small- and medium-sized enterprises and start-ups, FTs facilitate more inclusive and distributed patterns of innovation participation (Schwaeke et al., 2025). Such inclusiveness enhances the diversity of actors and ideas within the urban innovation system, strengthening its overall creative potential.

Moreover, FTs generate reinforcing dynamics that support sustained urban innovation capacity. On the one hand, scalable computational resources enable iterative development, rapid prototyping and experimentation with increasingly sophisticated models, thereby accelerating innovation cycles (Li et al., 2024; Nie et al., 2023). On the other hand, when computing power and data resources are made accessible through municipal computing centres or open data platforms, FTs function as a quasi-public good, substantially reducing barriers to AI-driven innovation (Hussain et al., 2025; Nambisan et al., 2019). This environment fosters a more dynamic and interconnected innovation ecology, amplifying opportunities for recombination and knowledge spillovers (Yoo et al., 2012).

In addition, the sustained investment in FTs contributes to long-term absorptive and accumulative effects. By establishing stable technological trajectories, cities with advanced FTs are better positioned to assimilate external knowledge, adapt to rapid AI advances and maintain innovation momentum under uncertainty (Aghion et al., 2021). Such stability enhances the predictability of innovation returns and strengthens urban innovation resilience over time (Gregory et al., 2021). Beyond its technical role, FTs also facilitate collaborative benefits by enabling shared infrastructures and public data environments that reduce redundant investments, mitigate systemic risks and promote inter-organisational collaboration (Tilson et al., 2010). In this sense, FTs function not only as a technological enabler but also as an institutional catalyst for sustainable urban innovation. Accordingly, we propose the following hypothesis:

H1a.

A higher level of foundational technologies (FTs) is associated with stronger urban innovation.

2.1.2 Core technologies and sustainable urban innovation.

CTs, embodied in algorithms, model architectures and software frameworks, occupy a pivotal midstream position within the AI innovation chain. CTs play a central role in transforming upstream computational power and data resources into structured, reusable knowledge assets, thereby linking foundational infrastructure with downstream applications. Through this integrative function, CTs enable cities to operationalise AI research into industrial and economic value, strengthening urban innovation capacity (Czarnitzki et al., 2023; Rammer et al., 2022).

The contribution of CTs to urban innovation operates through several interrelated mechanisms. A key mechanism is knowledge codification and modularisation. Standardised algorithmic interfaces and model frameworks convert foundational resources into explicit and replicable innovation modules, reducing technological uncertainty and facilitating the transition from experimental prototypes to scalable applications (Brusoni et al., 2023). By lowering development costs and shortening innovation cycles, CTs enhance the efficiency with which foundational inputs are translated into innovation outcomes. In addition, platform-based tools and application programming interfaces lower entry barriers for a broad range of urban actors, including start-ups, universities and incumbent firms, thereby fostering a more open and dynamic innovation environment (Gawer, 2014). Recent research suggests that such algorithmic capabilities can generate nonlinear synergies, allowing cities to deploy innovations across diverse industrial contexts (Babina et al., 2024).

CTs also induce agglomeration and network externalities within urban innovation systems. The emergence of algorithm hubs and AI middleware attracts high-skilled talent and supports the formation of localised innovation networks, which amplify knowledge spillovers and collective learning processes (Cohen and Levinthal, 1990). Over time, these dynamics give rise to clustering effects that enhance cities’ absorptive capacity and adaptive efficiency in the face of rapid technological change (Boschma, 2017). Consequently, cities with more advanced CTs capabilities tend to exhibit stronger innovation resilience and sustained innovation performance. Accordingly, we propose the following hypothesis:

H1b.

A higher level of core technologies (CTs) is associated with stronger urban innovation.

2.1.3 General-purpose technologies and sustainable urban innovation.

GPTs in the AI context, such as applications in smart transportation, health-care analytics and industrial automation, occupy the downstream end of the innovation chain, where upstream computational and algorithmic capabilities are translated into pervasive, value-generating solutions. As the application layer of the urban AI innovation system, GPTs embody the diffusion of AI across sectors, consistent with the classical view of GPTs as engines of economic growth (Bresnahan and Trajtenberg, 1995; Helpman, 1998).

Through cross-sectoral deployment, GPTs enable cities to reconfigure existing processes, optimise resource allocation and stimulate new business models in areas such as health care, mobility, manufacturing and public services. This process reflects “innovation by application”, whereby the recombination of AI tools across diverse urban contexts generates both incremental and transformative innovations (Bresnahan and Trajtenberg, 1995; Czarnitzki et al., 2023). By reducing coordination costs and enhancing interoperability across sectors, GPTs expand the overall scope of urban innovation and stimulate demand for complementary technological development (Helpman, 1998).

Moreover, widespread GPTs adoption contributes to the formation of localised learning environments. Real-world implementation produces rich operational data and user feedback, which inform local R&D priorities and align technological trajectories with societal needs (Von Hippel, 2006). This context-sensitive learning strengthens cities’ adaptive capacity and reinforces path-dependent innovation advantages over time (Rammer et al., 2022). Through diffusion, integration and evolutionary learning, GPTs thus act as a critical catalyst that transforms upstream technological capacities into scalable economic and societal impacts. Accordingly, we propose the following hypothesis:

H1c.

Broader application of general-purpose technologies (GPTs) is associated with stronger urban innovation.

The complementarity logic among FTs, CTs and GPTs is closely related to the practical problem of “upstream gaps and downstream overheating” in urban AI development. Cities that possess strong downstream AI applications but weak upstream infrastructures or limited core technological capabilities may attract attention through demonstration projects, yet such gains are often difficult to sustain in the absence of local absorptive capacity and technological depth. In contrast, cities with stronger alignment among FTs, CTs and GPTs are better positioned to transform AI deployment into cumulative innovation advantages. This helps to explain why similar levels of visible AI application may produce very different innovation outcomes across cities, and why policy design must distinguish between short-term diffusion effects and deeper capacity-building effects across technological layers.

2.2.1 Complementarity between foundational technologies and core technologies.

The interaction between FTs and CTs is central to enhancing urban innovation capacity. FTs provide the computational infrastructure and data platforms necessary for large-scale data processing and model training, while CTs build upon these resources to develop and refine algorithmic frameworks and learning models. This complementarity ensures that foundational resources are effectively transformed into actionable intelligence, thereby increasing the efficiency and productivity of urban innovation processes (Li et al., 2024; Rammer et al., 2022).

This interaction generates a co-evolutionary dynamic: robust FTs reduce the costs and risks of algorithmic experimentation, whereas advanced CTs increase the marginal returns to infrastructural investments by improving system-wide efficiency and adaptability (Ennen and Richter, 2010; Milgrom and Roberts, 1990). Together, they form an integrated infrastructure–algorithm nexus that accelerates knowledge recombination, supports rapid prototyping and amplifies urban innovation outputs (Brynjolfsson and Milgrom, 2013; Hussain et al., 2025). In this regard, we propose the following hypothesis:

H2a.

Foundational technologies (FTs) and core technologies (CTs) exhibit a positive complementary effect on urban innovation.

2.2.2 Complementarity between foundational technologies and general-purpose technologies.

The interaction between FTs and GPTs generates a dynamic synergy that reconfigures fundamental pathways for urban innovation. FTs, as reflected by the computational infrastructure, data platforms and hardware systems, provide the essential technological substrate underpinning scalable and reproducible innovation processes. GPTs, as versatile applications spread across different urban areas, use this substrate to change abstract computational capacity into local, context-sensitive solutions. This interplay initiates a recursive cycle: robust FTs reduce entry barriers and experimentation costs, thereby facilitating broader GPTs adoption, while GPT-generated feedback from real-world applications refines and redirects subsequent FTs investments (Li et al., 2024; Nie et al., 2023).

This complementarity strengthens urban innovation through two key mechanisms. Firstly, it accelerates knowledge recombination by merging domain-specific insights derived from GPTs applications with the generic capabilities of FTs, spawning novel solutions at sectoral intersections (Carnabuci and Operti, 2013). Secondly, it enhances systemic learning, whereby GPTs deployments yield rich, localised data that informs and optimises FTs architectures, establishing a co-evolutionary loop between infrastructure and application (Li et al., 2024; Lyytinen et al., 2016). This bidirectional reinforcement not only increases the scale and diversity of urban innovation but also strengthens its adaptive capacity and long-term sustainability. In this regard, we hereby propose the following hypothesis:

H2b.

Foundational technologies (FTs) and general-purpose technologies (GPTs) exhibit a positive and conditional complementary effect on urban innovation.

2.2.3 Complementarity between core technologies and general-purpose technologies.

The interaction between CTs and GPTs establishes a critical conduit within the AI innovation chain, marrying algorithmic depth with applicational breadth. CTs, manifested in machine learning models, pattern recognition frameworks and domain-specific algorithms, provide the technical sophistication required to convert raw computational power into actionable intelligence. GPTs, on the other hand, implement these capabilities across a range of urban sectors, integrating CTs-derived insights into industrial processes, public services and commercial platforms. This creates a feedback loop: as GPTs applications increase, they reveal domain-specific challenges and data patterns that refine CTs development. In turn, CTs advancements expand the functional scope and reliability of GPTs deployments (Czarnitzki et al., 2023; Rammer et al., 2022).

This complementarity augments urban innovation via two interrelated mechanisms. Firstly, it amplifies absorptive capacity: cities endowed with stronger CTs can more effectively tailor and scale GPTs tools, nurturing localised learning and cross-sectoral knowledge recombination (Grillitsch et al., 2018; Hussain et al., 2025). Secondly, it accelerates the diffusion of innovation by mitigating integration frictions: standardised CTs frameworks enable GPTs to permeate new industries more rapidly, generating positive spillovers that further develop the city’s innovation ecosystem (Boschma, 2017; Teece, 2018). It is evident that these dynamics have a considerable impact on both the intensity and adaptability of urban innovation. In this regard, we propose the following hypothesis:

H2c.

Core technologies (CTs) and general-purpose technologies (GPTs) exhibit a positive and conditional complementary effect on urban innovation.

2.2.4 Three-way interaction among foundational, core and general-purpose technologies.

The conditional interactive effects among FTs, CTs and GPTs arise from their integration into a cohesive urban AI innovation system. FTs provide the essential computational and infrastructural resources that facilitate large-scale experimentation and data processing. CTs then translate these raw inputs into structured, reusable algorithmic modules that enhance model efficiency and adaptability. GPTs then leverage these modules to deploy context-aware applications across urban domains, generating real-time feedback that refines the upstream and midstream layers (Li et al., 2024; Rammer et al., 2022). This dynamic learning cycle, driven by recursive interactions, enhances the utilisation efficiency of FTs, leading to the optimisation of CTs. This creates a self-reinforcing feedback mechanism that drives sustainable urban innovation.

Such tripartite integration is shown to yield systemic gains that surpass the sum of isolated effects. The synergy reduces innovation frictions, accelerates knowledge recombination across sectors and amplifies spillover effects through standardised interfaces and interoperable architectures (Ennen and Richter, 2010; Hussain et al., 2025). Cities that achieve this full-chain alignment experience not only exhibit heightened innovation outputs but also enhanced adaptive capacity and technological resilience, which are indispensable attributes in swiftly evolving AI landscapes. This systemic rationale underlies our expectation that the three-way interaction exerts a significant positive influence on sustainable urban innovation. In this regard, we hereby propose the following hypothesis:

H2d.

The three-way interaction among foundational technologies (FTs), core technologies (CTs) and general-purpose technologies (GPTs) is positively associated with urban innovation.

This study uses a balanced panel of 120 Chinese prefecture-level cities and above to examine how the three-tier AI technology framework affects urban innovation. The panel covers the period from 2010 to 2021, which captures a formative stage of digital infrastructure expansion and the early but systematic diffusion of AI technologies across Chinese cities. Despite the recent advances in generative AI leading to accelerated technological change since 2022, this study aims to identify the underlying mechanisms through which the different layers of AI capabilities influence sustainable urban innovation. In this regard, the 2010–2021 period remains particularly suitable, as it witnessed the rapid deployment of broadband, cloud computing and AI-related infrastructure, as well as China’s transition towards an innovation-driven, AI-enabled growth model (Li et al., 2024; Zeng et al., 2023).

As shown in Table 1, the sample includes major cities in the eastern coastal region (e.g. Shenzhen, Hangzhou and Qingdao), central China (e.g. Wuhan, Changsha and Zhengzhou) and western China (e.g. Chengdu, Guiyang and Lanzhou). This broad geographical coverage captures substantial variation in development stage, industrial structure and institutional context, and enables us to construct a consistent city-level panel based on patent data, urban statistics and manually verified AI technology categories. Similar multi-regional city panels have been widely used in previous studies of digital infrastructure, smart-city development and urban innovation dynamics in China (Li et al., 2024; Zeng et al., 2023).

Table 1.

City sample

No.CityProvinceRegion
1SanyaHainanEastern
2DaqingHeilongjiangEastern
3NanpingFujianEastern
4YanbianJilinEastern
5ZhoushanZhejiangEastern
6YingkouLiaoningEastern
7QiqiharHeilongjiangEastern
8JinhuaZhejiangEastern
9ZhangzhouFujianEastern
10JiningShandongEastern
11DongyingShandongEastern
12TangshanHebeiEastern
13JiangmenGuangdongEastern
14Tai’anShandongEastern
15TaizhouZhejiangEastern
16HuzhouZhejiangEastern
17YanchengJiangsuEastern
18TaizhouJiangsuEastern
19PutianFujianEastern
20LangfangHebeiEastern
21ZiboShandongEastern
22ZhanjiangGuangdongEastern
23LianyungangJiangsuEastern
24YangzhouJiangsuEastern
25YantaiShandongEastern
26HuizhouGuangdongEastern
27AnshanLiaoningEastern
28Huai’anJiangsuEastern
29WeifangShandongEastern
30LishuiZhejiangEastern
31QinhuangdaoHebeiEastern
32WeihaiShandongEastern
33QuanzhouFujianEastern
34ShaoxingZhejiangEastern
35DandongLiaoningEastern
36JiaxingZhejiangEastern
37WenzhouZhejiangEastern
38ZhuhaiGuangdongEastern
39BaodingHebeiEastern
40ShijiazhuangHebeiEastern
41DongguanGuangdongEastern
42XuzhouJiangsuEastern
43HandanHebeiEastern
44NantongJiangsuEastern
45FoshanGuangdongEastern
46ChangzhouJiangsuEastern
47FuzhouFujianEastern
48XiamenFujianEastern
49ZhenjiangJiangsuEastern
50WuxiJiangsuEastern
51JilinJilinEastern
52ChangchunJilinEastern
53DalianLiaoningEastern
54NingboZhejiangEastern
55ZhongshanGuangdongEastern
56ShenyangLiaoningEastern
57JinanShandongEastern
58HarbinHeilongjiangEastern
59SuzhouJiangsuEastern
60QingdaoShandongEastern
61TianjinTianjinEastern
62GuangzhouGuangdongEastern
63HangzhouZhejiangEastern
64NanjingJiangsuEastern
65ShanghaiShanghaiEastern
66ShenzhenGuangdongEastern
67BeijingBeijingEastern
68HaikouHainanEastern
69LinyiShandongEastern
70ShantouGuangdongEastern
71ZhaoqingGuangdongEastern
72NingdeFujianEastern
73JieyangGuangdongEastern
74LongyanFujianEastern
75CangzhouHebeiEastern
76FushunLiaoningEastern
77SanmingFujianEastern
78QingyuanGuangdongEastern
79PanjinLiaoningEastern
80BaotouInner MongoliaWestern
81XianyangShaanxiWestern
82NanchongSichuanWestern
83HohhotInner MongoliaWestern
84LiuzhouGuangxiWestern
85NanningGuangxiWestern
86ÜrümqiXinjiangWestern
87XiningQinghaiWestern
88LanzhouGansuWestern
89MianyangSichuanWestern
90GuiyangGuizhouWestern

Patent data are obtained from the China National Intellectual Property Administration (CNIPA) and supplemented by the IncoPat (PatSnap) database, which provide city-level patent information. City-level economic, demographic and innovation-related covariates are primarily drawn from the China City Statistical Yearbook and other official statistical yearbooks, which are standard sources in empirical studies of Chinese urban innovation. To ensure comparability, we restrict the sample to cities with continuous data coverage and non-trivial AI-related patenting over the study period. This yields a balanced panel that captures the main nodes of China’s AI and urban innovation landscape.

To test the hypothesised effects of AI’s three-layer technological structure on urban innovation, we introduce variables to capture innovation outputs, AI technology inputs and relevant city characteristics. The dependent variable, urban innovation, is measured using a composite index that captures multidimensional metrics such as innovation quality and impact, following the literature on multidimensional innovation metrics and city-level innovation studies that treat invention and utility model patents as key indicators of regional innovation performance (Zeng et al., 2023). Detailed definitions and descriptive statistics are reported in Table 2.

Table 2.

Variable measurement

Variable typeVariable nameSymbolDefinition and measurementData source
Dependent variableUrban innovationInnovationCity innovation index, calculated as the weighted sum of invention patents (0.5), utility model patents (0.3) and design patents (0.2) per 10,000 population. Log-transformed and normalisedCNIPA; China City Statistical Yearbook
Explanatory variablesFoundational technologiesFTsNumber of patents classified as foundational technologies according to the AI technology taxonomy defined in Table 2. Log-transformed and normalisedCNIPA Patent Database
Core technologiesCTsNumber of patents classified as core technologies according to the AI technology taxonomy defined in Table 2. Log-transformed and normalisedCNIPA Patent Database
General−purpose technologiesGPTsNumber of patents classified as general-purpose technologies according to the AI technology taxonomy defined in Table 2, Log-transformed and normalisedCNIPA Patent Database
Control variablesPopulation densitypopdensityNumber of permanent residents per square kilometreChina City Statistical Yearbook
Economic developmentecodevelopGDP per capita (log-transformed)China City Statistical Yearbook
Foreign direct investmentfdiRatio of actually used FDI to GDP (%)Ministry of Education Statistics
Trade opennessopenRatio of total imports and exports to GDPChina Statistical Yearbook on Science and Technology
Education expenditureeduexpRatio of government education expenditure to total fiscal expenditureChina City Statistical Yearbook
Industrial rationalisationindstruMeasured by the Theil index reflecting the coordination among industrial sectors (lower value = higher rationalisation)China City Statistical Yearbook
Technological leveltechlevelRatio of R&D expenditure to GDP (%)China City Statistical Yearbook
Heterogeneity variablesCitytypecitytypeBinary variable: 1 coastal cities, 0 otherwiseChina City Statistical Yearbook
Administrative levelcapitalBinary variable: 1 if provincial capital or direct-administered municipality, 0 otherwiseNational Bureau of Statistics Classification
Spatial locationtransportBinary variable: 1 if the city is a national/regional transportation hub per the medium and long-term railway network plan, 0 otherwiseState Council Designation
City scalecityscaleBinary variable: 1 for large/medium cities, 0 for small cities based on the State Council’s city size classificationMinistry of Science and Technology

The core explanatory variables are city-level stocks of AI-related patents, disaggregated into FT, CT and GPT. Drawing on recent studies that use digital and AI patents as a proxy for the intensity and structure of urban digital technologies (Li et al., 2024; Nie et al., 2023), we have developed a transparent multi-step classification framework. Specifically, the identification and classification of AI patents follow a six-step procedure: data collection, keyword construction, automated matching, manual validation, tier assignment and city-level aggregation, which are elaborated as follows.

A hierarchical AI keyword dictionary is first constructed on the basis of multiple authoritative sources, including the CNIPA classification system for key digital technology patents, the 14th Five-Year Plan for Digital Economy Development issued by the State Council and WIPO’s Artificial Intelligence: Classification and Terminology framework. These sources provide the initial keyword pool, which is then refined through text-based screening, synonym identification and iterative review of related technical terms, to capture synonymous, related and emerging technical terms. According to the dictionary, the identification of AI-related patents is achieved through a process of keyword matching against patent titles, abstracts and IPC/CPC classification codes. This multi-dimensional matching strategy improves identification accuracy and reduces potential omission bias.

To ensure the reliability of the classification process, we implement a manual validation procedure for a random sample of patents, especially those with ambiguous or overlapping technical descriptions. Two researchers independently reviewed the sampled patents, and inter-rater consistency was assessed through independent coding and cross-checking by them. Any discrepancies are resolved through discussion until consensus is reached. This procedure helps ensure consistency between automated classification and expert judgement and improves the robustness of the measurement strategy.

The validated AI patents are then assigned to three technological tiers according to predefined classification rules. FTs refer to AI-enabling infrastructure and technical foundations, including AI chips, computing hardware, storage systems and data infrastructure. CTs capture a range of capabilities, including machine-learning algorithms, model architectures, training methods and technical frameworks. GPTs refer to downstream application-oriented technologies, including cross-industry intelligent solutions, embedded AI systems and intelligent products. For patents spanning multiple categories, classification is based on the dominant technological function, with boundary cases subject to additional manual verification.

At the final stage, classified patents are aggregated to the city–year level for each technological tier. The resulting variables are transformed using log(1 + x) to mitigate skewness and then standardised for empirical analysis. The taxonomy of AI technologies and representative keyword categories are summarised in Table 3, while additional details on matching rules, boundary cases and validation procedures are provided in  Appendix.

Table 3.

Measurement of artificial intelligence technology application

Primary technology branchSecondary technology branchTertiary technology branch
Foundational technologiesIntelligent chipsGPU, FPGA, ASIC, brain-like chips, NPU, neural networks and AI chips
Intelligent algorithmsLogic programming, programmable logic gate arrays, fuzzy logic, swarm intelligence and support vector machines
Intelligent computingIntelligent data analysis, brain-inspired intelligent computing and quantum intelligent computing
Natural language processingMachine translation, semantic understanding, intelligent speech, speech recognition, speech synthesis and semantic analysis
Core technologiesComputer visionImage recognition, image generation, image enhancement and image classification
Biometric recognitionFingerprint recognition, face recognition, iris recognition, voiceprint recognition, DNA recognition and behavioural feature recognition
Human–computer interactionVoice interaction, somatosensory interaction, gesture interaction, brain–computer interaction, augmented reality (AR) and virtual reality (VR)
General-purpose technologiesKnowledge graphKnowledge extraction, knowledge processing, knowledge fusion and ontology
Machine learningMulti-task learning, reinforcement learning, deep learning and supervised learning
Intelligent robotsIndustrial robots, service robots, collaborative robots and machine intelligence
Pattern recognitionObject detection, object tracking, intelligent recommendation, intelligent agents and intelligent terminals
Hybrid intelligenceIntelligent manufacturing, expert systems, adaptive systems and bio-intelligence

Control variables follow established determinants of urban innovation capacity and digital-economy development. We include indicators for population density and economic development to capture agglomeration and demand-side effects. We also include trade openness and foreign direct investment to proxy external linkages, education expenditure as a measure of human-capital investment and industrial structure and technological level to reflect the maturity of the local production and knowledge system (Li et al., 2024; Zeng et al., 2023). To explore contextual heterogeneity, we construct dummy variables such as coastal versus inland location, administrative status (provincial capitals and municipalities versus other cities), transport hub designation and city size based on the State Council’s classification. The interaction and heterogeneity terms used in the empirical analysis are constructed directly from these variables and are reported in Table 2.

To identify the impact of the three-tier AI technology framework on urban innovation, we estimate a set of two-way fixed-effects models using a balanced panel of 120 cities from 2010 to 2021. Let Innovationit denotes the innovation index of city i in year t and TitFTsit,CTsit,GPTsit denotes the city-level stocks of AI patents in foundational, core and GPTs, respectively. The baseline specification is as follows:

(1)

where Xit is a vector of control variables capturing economic development, openness, human capital, industrial structure and technological level, μi and λt are city and year fixed effects, and εit is an idiosyncratic error term. We estimate equation (1) separately for FTsit, CTsit and GPTsit to recover their direct effects on urban innovation. Standard errors are clustered at the city level.

To examine whether AI technologies at different layers are mutually reinforcing, we augment the baseline specification with interaction terms between the three tiers. Specifically, we include the following terms: FTsit×CTsit, FTsit×GPTsit and CTsit×GPTsit, as well as a three-way interaction FTsit×CTsit×GPTsit. These terms capture complementary and conditional effects between any two layers, as well as the systemic synergy arising from the joint deployment of the full three-tier AI architecture. The coefficients on these interaction terms provide direct evidence as to whether cities that simultaneously accumulate upstream infrastructure, midstream algorithmic capabilities and downstream applications achieve disproportionate innovation gains relative to cities that specialise in only one or two tiers.

We further explore heterogeneity in these effects across urban contexts by interacting the AI technology variables with time-invariant city characteristics, including coastal versus inland location, administrative status (provincial capitals and municipalities), transport-hub designation and city size, as defined in Section 3.2 and summarised in Table 2. This strategy allows us to test whether the marginal contribution of FTs, CTs and GPTs to urban innovation is systematically stronger in cities with greater openness, higher administrative rank, better connectivity or larger agglomerations, while maintaining a unified estimation framework.

A key concern is the potential endogeneity arising from reverse causality, whereby more innovative cities may invest more in AI, and from omitted city-specific factors that evolve slowly over time. City and year fixed effects, together with a rich set of time-varying controls, mitigate part of this concern. In addition, we implement an instrumental-variable (2SLS) strategy using historical communication infrastructure, measured by pre-AI-era telecommunications and related infrastructure indicators at the city level, as an instrument for contemporary AI patent measures. These historical variables are closely related to a city’s later capacity to build digital and AI infrastructure, while being less likely to respond directly to current AI-driven innovation shocks. The instrumental variable (IV) specification mirrors equation (1), with the AI technology variables replaced by their fitted values from the first stage. The corresponding first-stage and over-identification statistics are reported alongside the main results.

Table 4 reports the descriptive statistics for the variables in this study. Urban innovation, the dependent variable, is measured as a scaled composite index between 0 and 1, with a mean value of 0.02 and a standard deviation of 0.06. Only a small number of observations lie close to the upper bound. This indicates that most Chinese cities still operate at relatively low innovation levels over the sample period, while a few frontier cities (e.g. leading coastal and provincial capital cities) exhibit markedly higher performance. This skewed distribution is consistent with the uneven geography of innovation in China and further justifies our focus on heterogeneity and complementarity across different AI technology layers.

Table 4.

Descriptive statistics

VariablesObs.MeanSDMin.Max.
Innovation1,4040.020.0601
FTs1,4040.0130.0500.931
CTs1,4040.0180.06500.941
GPTs1,4040.010.04300.795
popdensity1,4046.1730.634.0438.136
ecodevelop1,40411.0550.5259.36613.056
fdi1,4040.0230.0200.132
open1,4040.30.3620.0052.45
eduexp1,4040.1770.040.0180.356
indstru1,4040.1990.157−0.0451.05
techlevel1,4040.0260.0210.0010.163

Three AI technology variables, namely, FTs, CTs and GPTs, also demonstrate significant heterogeneity across cities and over time. FTs have a mean of 0.013 and a standard deviation of 0.050, with a maximum of 0.931. CTs show a slightly higher mean (0.018) and dispersion (standard deviation 0.065, maximum 0.941). This suggests that, on average, cities invest more intensively in algorithmic and model-related capabilities than in pure computing infrastructure. GPTs have a mean of 0.010 and a standard deviation of 0.043, with a maximum of 0.795. This indicates that, while downstream, application-oriented AI patents are less ubiquitous, they reach high levels in a subset of cities. The wide range of FTs, CTs and GPTs highlights significant differences in how cities position themselves within the AI technology chain. At a descriptive level, this also suggests that Chinese cities do not follow a uniform AI development path; rather, they differ substantially in the extent to which they emphasise upstream infrastructure, core algorithms or downstream applications.

Control variables such as population density, economic development, trade openness, foreign direct investment, education expenditure, industrial structure and technological level exhibit reasonable variation across cities and over time, which is consistent with previous studies on Chinese urban innovation and digital transformation (Nie et al., 2023). The distributions of the innovation index, AI technology tiers and control variables suggest that there is sufficient dispersion in the panel to identify the direct and interactive effects of the three-layer technological structure of AI on urban innovation. Furthermore, the absence of excessive concentration in any single explanatory variable reduces concerns that the baseline estimates are mechanically driven by a few extreme observations.

Table 5 reports the baseline fixed-effects estimates for the three-tier AI technology model. Models (1) to (3) demonstrate that FTs, CTs and GPTs each exert a significantly positive influence on urban innovation, with all three coefficients statistically different from zero at the 1% level. This provides clear empirical support for H1a–H1c: upstream infrastructure, midstream algorithmic capabilities and downstream applications all make independent contributions to city-level innovation capacity. These findings are in line with recent firm- and region-level studies (Czarnitzki et al., 2023; Rammer et al., 2022), which indicate that the adoption of AI and digital technologies is associated with increased innovation output and productivity, rather than merely incremental efficiency gains. More importantly, our city-level evidence shows that these positive effects are not confined to one single segment of the AI chain but are distributed across the foundational, core and application layers.

Table 5.

Baseline regression and interactive effects analysis

VariablesModel 1Model 2Model 3Model 4Model 5Model 6Model 7Model 8
InnovationInnovationInnovationInnovationInnovationInnovationInnovationInnovation
FTs0.855*** (31.82)0.489*** (8.26)0.697*** (9.95)0.526*** (7.39)0.500*** (7.04)
CTs0.766*** (13.09)0.259*** (5.01)0.491*** (6.60)0.250*** (4.78)0.270*** (4.95)
GPTs0.901*** (9.85)0.176** (2.02)0.414*** (2.92)0.112 (1.28)0.050 (0.45)
FTs × CTs0.164*** (3.49)
FTs × GPTs0.035 (0.50)
CTs × GPTs−0.029 (−0.20)
FTs × CTs × GPTs0.125** (2.17)
Constant0.030 (0.33)0.196* (1.67)0.322** (2.54)0.081 (1.03)0.105 (1.63)0.317*** (2.92)0.162** (2.45)0.123* (1.92)
Observations1,4041,4041,4041,4041,4041,4041,4041,404
R-squared0.9740.9590.9470.9800.9760.9690.9800.980
controlYesYesYesYesYesYesYesYes
City FEYesYesYesYesYesYesYesYes
Year FEYesYesYesYesYesYesYesYes
Note(s):

***p < 0.01, **p < 0.05 and * p < 0.1

The relative magnitudes of the coefficients highlight the different roles of the three tiers in the AI innovation chain. FTs have a large and robust coefficient, consistent with their role as a quasi-infrastructural input that reduces the marginal cost of experimentation and supports cumulative knowledge production over time. CTs also show a substantial positive association with innovation, suggesting that algorithmic frameworks, models and platforms are critical in translating raw computing capacity into structured, codified knowledge that can be reused across projects and sectors. GPTs exhibit the largest point estimate, but also greater sensitivity across specifications. This finding is consistent with the existing GPTs literature, which suggests that broad application-layer gains tend to be highly context-dependent and contingent on the supporting technological and institutional environment (Bresnahan and Trajtenberg, 1995). In other words, cities have the potential to witness substantial advancements in innovation as a consequence of the implementation of GPTs, but only when the necessary upstream layers and complementary factors are sufficiently developed. Therefore, the coefficient pattern should not be interpreted as evidence that application-layer AI is universally more powerful than FTs or CTs; rather, it indicates that GPTs may generate high returns in some cities while remaining shallow or fragmented in others.

Models (4)–(8) add interaction terms to examine whether the three AI tiers are mutually reinforcing. The interaction between FTs and CTs is positive and statistically significant, lending support to H2a and indicating that cities combining strong infrastructures with sophisticated algorithmic capabilities obtain more than additive innovation returns. In contrast, the two-way interactions FTs × GPTs and CTs × GPTs, corresponding to H2b and H2c, are generally small and not statistically significant once fixed effects and controls are included. It is also worth noting that the three-way interaction FTs × CTs × GPTs is positive and significant, supporting H2d. These findings indicate that system-level complementarities emerge primarily when the full three-tier architecture is in place, suggesting that pairwise couplings alone are not sufficient to generate robust additional gains beyond the direct effects of each tier. This pattern is particularly important because it reveals that AI-driven urban innovation follows a layered logic of capability accumulation, rather than a simple linear logic in which adding more AI in any form necessarily yields higher innovation.

The absence of strong stand-alone FTs–GPTs and CTs–GPTs complementarities can be interpreted through the lens of GPTs theory and complementarity theory. As GPTs research highlights, GPTs generate significant innovation only when supported by suitable complementary technologies and organisational structures (Bresnahan and Trajtenberg, 1995). In our setting, CTs act as a threshold layer that conditions the FTs–GPTs relationship. CTs integrate domain-specific modelling skills, engineering routines and problem-framing capabilities, enabling cities to convert raw computing resources and application concepts into high-quality patents, breakthrough inventions and frontier innovations. If CTs are underdeveloped, the joint presence of FTs and GPTs may produce only fragmented or shallow applications, which do not fully exploit the infrastructural potential of FTs or the cross-sectoral scope of GPTs. This finding suggests that the benefits of innovation depend on the coordinated alignment of underlying capabilities rather than on any single asset in isolation. Accordingly, GPTs should be understood less as an autonomous innovation engine and more as an outcome-sensitive layer whose effectiveness depends on the maturity of the rest of the AI stack.

A similar argument can be made for the CTs–GPTs relationship: the realisation of CTs–GPTs complementarities is limited by the level of FTs development. Without robust digital infrastructure, such as high-capacity networks, reliable data platforms and scalable computational resources, algorithmic advances and application pilots remain “local experiments” whose benefits do not diffuse widely across sectors or cities. In such settings, even sophisticated CTs projects and ambitious GPTs initiatives encounter bottlenecks relating to data availability, interoperability and computational limitations. This results in limited technological depth and weak spillovers. Conversely, when FTs are well developed, CTs can be deployed on a large scale and GPTs can propagate through platform interfaces and shared standards, producing innovation gains across the entire system. This is consistent with the idea that GPTs require layered, co-evolving infrastructures to generate economy-wide productivity improvements (Bresnahan and Trajtenberg, 1995), and with the broader view that the complementarities of complex technologies are often multi-layered and hierarchical, rather than purely pairwise (Brynjolfsson and Milgrom, 2013). These interaction results suggest that the missing pairwise complementarities are not contradictory to GPTs theory; instead, they reveal the conditional and system-dependent nature of GPTs realisation in urban innovation systems.

The interaction estimates also correspond with the heterogeneity results reported later. We find that FTs and CTs have a stronger effect in coastal and policy-favoured cities, whereas GPTs have a relatively weaker effect in non-coastal and less connected cities. This pattern indicates that cities with substantial FTs and CTs capabilities, particularly those with advanced infrastructure, robust policy support and dense innovation networks, are better positioned to translate GPTs deployments into system-level innovation outcomes. For smaller or less well-resourced cities, investments that focus exclusively on GPTs may result in isolated demonstration projects rather than ongoing, high-intensity, cross-sectoral innovation. This is because the necessary infrastructural and algorithmic components are either lacking or underdeveloped. From this perspective, our results provide further evidence on the relationship between AI and innovation. They confirm that AI adoption is generally beneficial (Czarnitzki et al., 2023; Rammer et al., 2022), but they also show that the configuration and alignment of AI capabilities along the technology chain critically determine the size and nature of these benefits. Thus, the empirical evidence supports a configuration-based rather than intensity-based interpretation of AI’s innovation effects.

Overall, the above results support a layered and conditional view of AI-driven urban innovation. H1a–H1c are robustly supported, with each tier (FTs, CTs and GPTs) demonstrating its own significant effect on urban innovation. H2a and H2d are also supported, highlighting the importance of FTs–CTs complementarity and the additional innovation gains unlocked when the three tiers are developed together. Conversely, the lack of robust evidence for H2b and H2c indicates that two-way complementarities involving GPTs do not occur automatically but depend on threshold conditions in the remaining tier. Theoretically, this goes beyond simple “more AI is better” narratives, contributing to the GPTs and complementarity literatures by demonstrating that sustained innovation capacity in complex urban systems is driven by the orchestrated integration of foundational, core and GPTs, rather than by any single technology or pair of technologies. This result also provides an important empirical bridge between urban innovation studies and the literature on layered digital infrastructures, showing that the internal structure of AI matters for understanding regional innovation outcomes.

We conduct a series of robustness checks to verify that the main findings are not influenced by measurement choices, functional-form assumptions or the composition of the sample. Firstly, we replace the composite innovation index with the following alternative dependent variables:

  • per capita invention patents;

  • the proportion of invention patents within the total number of patents; and

  • a high-quality patent index focusing on invention and PCT patents.

Across these alternative measures, the coefficients on FTs, CTs and GPTs remain positive and statistically significant, and the relative ordering of their magnitudes is preserved. This confirms that the three-tier AI framework is associated with both higher innovation intensity and improvements in innovation quality. These results are particularly important because they suggest that AI-related technological accumulation is linked not only to more innovation outputs but also to more sophisticated and internationally oriented innovation activities.

Secondly, we test the robustness of the AI technology measures and model specification. We use alternative constructions of FTs, CTs and GPTs based on granted rather than applied patents and on within-city patent shares rather than absolute stocks, as well as specifications that include lagged AI variables to capture dynamic effects [see Table 6]. In all cases, the key coefficients remain positive and significant, and the FTs–CTs interaction and three-way FTs × CTs × GPTs term continue to exhibit strong complementarities. Furthermore, the absence of robust FTs–GPTs and CTs–GPTs interactions persists under these alternative formulations, supporting our assertion that GPTs-related complementarities are dependent on the maturity of the remaining tier, rather than being purely mechanical. The stability of this pattern across alternative specifications strengthens confidence that the identified complementarity structure reflects an underlying technological mechanism rather than an artefact of variable construction.

Table 6.

Robustness checks: one-period lag effect

VariablesModel 1Model 2Model 3
InnovationInnovationInnovation
LFTs1.025*** (25.27)
LCTs0.833*** (11.89)
LGPTs1.211*** (5.82)
Constant0.158 (1.61)0.527*** (3.25)0.449*** (2.65)
Observations1,2871,2871,287
R-squared0.9730.9630.945
controlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Robustness checks: two-period lag effect
L2FTs1.285*** (9.50)
L2CTs0.910*** (5.54)
L2GPTs2.020*** (5.17)
Constant0.124 (0.69)0.569* (1.71)0.533 (1.63)
Observations1,1701,1701,170
R-squared0.9690.9640.966
ControlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Robustness checks: three-period lag effect
L3FTs1.800*** (11.32)
L3CTs1.021*** (5.08)
L3GPTs2.883*** (7.06)
Constant−0.010 (−0.07)0.593 (1.51)0.539* (1.83)
Observations1,0531,0531,053
R-squared0.9700.9650.978
controlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Note(s):

***p < 0.01 and *p < 0.1

Thirdly, we consider the robustness of the sample and control. By excluding municipalities that are directly under the central government and the largest cities, and by adding further controls for industrial diversification and financial development, the estimated results are highly consistent with the baseline. FTs, CTs and GPTs all have significant positive effects, and the pattern of interaction terms remains unchanged. This suggests that our findings are not driven by a few superstar cities or omitted macro-structural characteristics. In addition, placebo regressions using pre-AI period values and alternative timing assumptions yield no systematic spurious effects, which mitigates concerns that the results merely capture pre-existing innovation trajectories. Hence, the core conclusions are unlikely to be explained by sample selection bias, city-size dominance or simple trend continuation.

To address potential endogeneity, particularly reverse causality from innovation to AI investment and slowly evolving, unobserved city characteristics, we implement an IV strategy; the results are summarised in Table 7. Historical communication infrastructure, proxied by city-level indicators of fixed-line telephony and early internet penetration prior to the deployment of modern AI technologies, is used as an instrument for contemporary FTs, CTs and GPTs. These historical variables are strongly correlated with the capacity to build digital and AI infrastructure, yet are likely to be independent of current AI-specific innovation shocks. The first-stage results show high F-statistics, alleviating concerns about weak instruments. The signs and significance levels of the second-stage coefficients are also consistent with the fixed-effects estimates. In particular, the estimated effects of FTs, CTs and GPTs on urban innovation remain strongly positive, suggesting robustness to potential endogeneity. Although no IV strategy can fully eliminate all identification concerns, the consistency between the baseline and IV results considerably strengthens the credibility of the causal interpretation.

Table 7.

Instrumental variable (2SLS)

VariablesModel 1Model 2Model 3Model 4Model 5Model 6
FirstSecondFirstSecondFirstSecond
FTsinnovationCTsinnovationGPTsinnovation
Post office37.657*** (6.473)
FTs0.935*** (0.040)
Telephones115.187*** (13.179)
CTs0.997*** (0.032)
Post output88.500*** (17.780)
GPTs1.415*** (0.151)
Observations1,2841,2841,2841,2841,2841,284
R-squared0.9090.8170.635
ControlYesYesYesYesYesYes
City FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Note(s):

***p < 0.01

In summary, the robustness checks and IV results support three main conclusions. (1) The positive effects of FTs, CTs and GPTs on urban innovation are stable across a wide range of outcome measures, AI indicators, model specifications and samples. This provides strong empirical backing for H1a–H1c. (2) The evidence for FTs–CTs complementarity and the three-way FTs–CTs–GPTs synergy is robust, while the absence of standalone FTs–GPTs and CTs–GPTs complementarities survives all robustness and endogeneity corrections. This suggests that H2aH2d should be interpreted as conditional rather than unconditional complementarity hypotheses. (3) The consistency between fixed-effects and IV estimates suggests that, while theoretically important, endogeneity concerns are unlikely to overturn the study’s major finding: in complex urban systems, urban innovation capacity is sustained by the coordinated development of foundational, core and general-purpose AI technologies, rather than by isolated investment in any single layer. Overall, the robustness analysis confirms that the main empirical message of this paper is both statistically stable and substantively meaningful.

To examine whether the three tiers of AI technology have different effects in various urban contexts, we analyse the interaction between FTs, CTs and GPTs and four time-invariant city characteristics: coastal location, policy resources, transport accessibility and city size. The results are reported in Table 8. This analysis further tests whether the returns to AI depend not only on technological configuration but also on broader urban development conditions:

Table 8.

Heterogeneity of coastal cities

VariablesModel 1Model 2Model 3
InnovationInnovationInnovation
Coastal × FTs0.282*** (6.67)
FTs0.626*** (20.05)
Coastal × CTs0.115* (1.65)
CTs0.629*** (11.32)
Coastal × GPTs−0.137 (−1.50)
GPTs1.049*** (20.21)
Constant0.009*** (21.03)0.006*** (7.67)0.011*** (13.58)
Observations1,4041,4041,404
R-squared0.9710.9580.942
ControlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Heterogeneity of policy resources
Policyresource × FTs−0.182* (−1.72)
FTs1.052*** (10.19)
Policyresource × CTs0.338*** (5.15)
CTs0.528*** (9.65)
Policyresource × GPTs0.517*** (5.44)
GPTs0.619*** (10.98)
Constant0.008*** (14.57)0.006*** (11.11)0.011*** (18.02)
Observations1,4041,4041,404
R-squared0.9710.9710.963
ControlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Heterogeneity of transportation accessibility
Transportation × FTs−0.226** (−2.45)
FTs1.117*** (12.49)
Transportation × CTs0.028 (0.35)
CTs0.713*** (9.59)
Transportation × GPTs−0.467*** (−3.13)
GPTs1.381*** (9.60)
Constant0.008*** (14.90)0.006*** (7.30)0.010*** (11.71)
Observations1,4041,4041,404
R-squared0.9700.9570.943
ControlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Heterogeneity of city size
Cityscale × FTs−0.313* (−1.84)
FTs1.205*** (6.94)
Cityscale × CTs0.104 (1.20)
CTs0.637*** (5.07)
Cityscale × GPTs0.223 (0.76)
GPTs0.690** (1.96)
Constant0.008*** (16.49)0.006*** (6.16)0.011*** (11.70)
Observations1,4041,4041,404
R-squared0.9690.9570.942
ControlYesYesYes
City FEYesYesYes
Year FEYesYesYes
Note(s):

***p < 0.01, **p < 0.05 and *p < 0.1

  • Coastal vs non-coastal cities

Table 8 shows that the interaction terms for coastal cities with FTs and CTs are positive and significant, whereas the interaction term for GPTs is negative and insignificant. This suggests that the innovation-enhancing effects of FTs and CTs are more significant in coastal cities, where greater openness, more developed resource markets and denser innovation networks facilitate the deployment of infrastructure and algorithmic capabilities. However, GPTs do not result in significantly stronger innovation in non-coastal cities. This indicates that these locations with application-oriented AI patents lack the complementary capabilities and absorptive capacity required for urban innovation, suggesting that regional openness and coastal advantages amplify the innovation returns to FTs and CTs, but not necessarily to GPTs. In other words, coastal advantages appear to strengthen the accumulation and coordination of upstream and midstream capabilities more than the isolated expansion of downstream applications:

  • Policy resources

As reported in Table 8, the interaction between policy resources and FTs is negative and weakly significant, whereas the interactions with CTs and GPTs are positive and strongly significant. In regions with abundant policy resources, innovation benefits thus tend to concentrate in higher-tier AI technologies, with CTs and GPTs being more responsive to institutional support, fiscal subsidies and targeted policy initiatives in areas such as smart manufacturing and the digital economy. The negative FTs interaction suggests diminishing marginal returns to further foundational investment where basic infrastructure is already abundant, and where policy instruments increasingly favour frontier or application-oriented projects. These findings highlight that policy resources selectively amplify the innovation effects of CTs and GPTs, while crowding out or flattening additional returns to FTs. This also implies that policy support does not uniformly enhance all AI layers; rather, it may reallocate innovation incentives towards technologies that are closer to commercialisation and visible policy performance:

  • Transportation accessibility

As shown in Table 8, the interaction between transportation accessibility and FTs and GPTs is negative and significant, while the interaction with CTs is not statistically significant. High accessibility appears to weaken the local innovation impact of FTs and GPTs, likely because greater factor and knowledge mobility increases reliance on external diffusion and reduces the marginal contribution of local infrastructure and application patents. In highly connected cities, GPTs may also diffuse so widely that local differentiation becomes harder to sustain, leading to more homogeneous innovation outcomes. In contrast, CTs maintain a consistent effect across different accessibility levels, indicating their more profound integration within specialised industrial and R&D structures that are less susceptible to external substitutions. These findings suggest that transportation accessibility plays a moderating role in the relationship between AI technologies and urban innovation by reshaping the balance between local accumulation and inter-city spillovers. This result further indicates that some AI capabilities, especially FTs and GPTs, may generate innovation through both local production and cross-city diffusion channels, whereas CTs remain more territorially embedded:

  • City size

As shown in Table 8, the interaction between city size and FTs is negative and marginally significant, while the interactions with CTs and GPTs are insignificant. This suggests that FTs exert a relatively stronger innovation effect in smaller and medium-sized cities, while the impacts of CTs and GPTs are broadly similar across city sizes. Smaller cities may derive greater marginal benefits from foundational AI investments because governance is more flexible, coordination chains are shorter and incremental infrastructure improvements can more easily be translated into production and process upgrades. In large cities, by contrast, congestion, higher costs and more complex coordination may reduce the additional benefits of FTs. These results suggest that FTs are more sensitive to local governance and resource constraints, whereas higher-tier AI technologies display greater universality across the urban hierarchy. This finding also implies that latecomer or non-superstar cities may still improve innovation performance through targeted strengthening of foundational AI capabilities, even if they do not yet possess strong advantages in advanced algorithms or frontier applications.

When considered as a whole, the heterogeneity analysis shows that the innovation effects of AI’s three tiers are strongly context-dependent. FTs and CTs are particularly powerful in open, coastal and policy-favoured cities, as well as in smaller cities that can flexibly embed foundational systems. However, GPTs deliver robust innovation gains only where complementary infrastructural, institutional and network conditions are in place. Therefore, urban AI development should be understood as a place-sensitive process in which similar technologies can yield very different innovation outcomes depending on local capability structures and contextual conditions.

Consistent with the heterogeneity results reported above, the innovation effects of layered AI development vary not only across broad city categories but also across specific urban development pathways. To make the heterogeneity results more interpretable, we complement the broad group-based analysis with several representative city illustrations. These examples are not intended as formal case studies, but as contextual interpretations of how different configurations of FTs, CTs and GPTs shape urban innovation pathways in practice. In this sense, the observed heterogeneity across city groups can be understood not merely as statistical variation, but as reflecting differences in the internal alignment of AI-related technological layers under distinct local industrial and institutional conditions. They therefore serve as qualitative illustrations of the mechanisms implied by the econometric results, rather than independent sources of causal evidence.

A first illustrative case is Shenzhen, which represents a relatively mature and highly integrated AI development pathway. As one of China’s leading innovation-oriented cities, Shenzhen has strong advantages in electronic information manufacturing, communication equipment, intelligent terminals and other hardware-related sectors that support the development of FTs. At the same time, the city hosts a dense innovation ecosystem composed of platform firms, AI enterprises, software developers, research institutes and venture capital, which contributes to the strengthening of CTs. Importantly, these upstream and midstream capabilities have been connected to a wide range of downstream GPTs applications, including smart manufacturing, logistics, security, finance and urban governance. In such a context, the complementarity across FTs, CTs and GPTs is more likely to be realised, allowing AI investment to translate into broader and more sustained urban innovation outcomes. Shenzhen therefore helps illustrate why cities with stronger cross-layer coordination tend to display higher innovation returns from AI development. It represents the type of city in which the significant three-way complementarity identified in the regressions is most likely to materialise.

A second representative case is Hangzhou, where the AI development trajectory appears more strongly driven by the interaction between CTs and GPTs. Compared with Shenzhen, Hangzhou’s comparative advantage lies less in hardware manufacturing and more in digital platforms, cloud computing, data services, algorithm development and software ecosystems. Supported by leading platform-based firms and a vibrant digital economy, the city has achieved extensive AI application in e-commerce, fintech, intelligent services and urban governance. This pattern suggests that strong CTs can be rapidly translated into GPTs-oriented innovation when supported by rich data resources, market demand and application scenarios. At the same time, Hangzhou also illustrates a potential structural risk emphasised in our framework: if downstream applications expand more quickly than upstream technological capabilities, the city may become more dependent on external supply in critical foundational segments. This example helps explain why application prosperity alone does not necessarily eliminate the strategic importance of strengthening FTs. In this sense, Hangzhou captures both the opportunities and vulnerabilities of an application-intensive AI development pathway.

A third example is Chengdu, which provides a useful illustration of how an inland core city may cultivate a more balanced AI innovation pathway. Chengdu benefits from a relatively solid scientific and educational base, a growing electronics and information industry and a rising software and digital services sector. Although its external visibility in AI may differ from that of leading coastal cities, it has developed a comparatively coordinated structure across FTs, CTs and GPTs. On the one hand, its industrial and research capabilities support the accumulation of foundational and CTs; on the other hand, AI applications have gradually expanded into transportation, manufacturing, public services and regional digital governance. Chengdu therefore reflects a development path in which urban innovation is not solely dependent on frontier platform giants, but can also emerge from the gradual alignment of research resources, industrial capabilities and local application needs. This case helps interpret why some inland or non-frontier cities may still achieve meaningful AI-enabled innovation when cross-layer coordination is sufficiently strong. It therefore provides a concrete illustration of how inland cities can reduce location disadvantages through balanced capability-building across layers.

A contrasting case is Guiyang, which helps illustrate the consequences of structural imbalance across AI layers. Guiyang has gained national attention for its early development of big data infrastructure, data centres and computing-related facilities, giving it visible strengths in some aspects of FTs. However, compared with cities such as Shenzhen or Hangzhou, its local ecosystem for advanced algorithm development, frontier AI firms and high-level innovation commercialisation has been relatively less developed. As a result, strengths in infrastructure and data-related capacity do not automatically translate into equally strong CTs accumulation or broad GPTs-driven innovation spillovers. This case is particularly helpful for understanding one of the central arguments of this paper: leadership in a single AI layer, even in an important upstream domain, does not by itself guarantee stronger urban innovation outcomes. Without effective coupling to core technological capabilities and downstream applications, the innovation effects of AI investment may remain partial or uneven. Guiyang therefore illustrates why our empirical results emphasise coordination failure, rather than simple resource shortage, as a central constraint on AI-driven urban innovation.

These city illustrations reinforce the broader implication of our empirical findings: AI-driven urban innovation is not determined simply by the scale of AI deployment, nor by whether a city belongs to a broad category such as coastal, inland, large or small. Rather, it depends more fundamentally on whether the three technological layers are mutually aligned and embedded within supportive local institutional, industrial and market conditions. In this sense, the heterogeneity identified in the empirical analysis should be understood not merely as statistical variation across city groups, but as reflecting different urban development pathways in the organisation of AI-related innovation systems. This interpretation also helps bridge the quantitative and contextual dimensions of the analysis by linking econometric patterns to plausible real-world configurations of AI development.

These examples also help clarify the policy meaning of the upstream gaps and downstream overheating phenomenon discussed in this study. Cities with strong application demand but weak foundational support may experience rapid short-term expansion in GPTs without forming durable innovation advantages. Conversely, cities with visible investments in infrastructure or computing capacity may fail to generate broader innovation gains if CTs and application ecosystems remain underdeveloped. The key policy lesson, therefore, is that urban AI strategy should not focus on isolated technological segments. More effective urban innovation policy requires coordinated investment across FTs, CTs and GPTs, while adapting such coordination to each city’s existing industrial structure, research base and market environment. This is fully consistent with the regression results showing that the most robust additional gains arise from FTs–CTs complementarity and full three-tier alignment, rather than from isolated development at the application layer alone.

This study shows that the three tiers of AI technologies (i.e. FTs, CTs and GPTs) are all positively associated with sustainable urban innovation, though in systematically different ways. FTs and CTs exhibit consistently strong effects, whereas the standalone effect of GPTs weakens once all three tiers are included in the model. This suggests that application-oriented AI does not drive urban innovation independently; rather, its effectiveness depends on the prior accumulation of infrastructural (FTs) and algorithmic (CTs) capabilities. While this finding is broadly consistent with firm-level evidence that AI adoption can enhance innovation and productivity, our study extends the literature by showing that the configuration of AI within a city’s technological architecture is at least as important as the overall intensity of AI deployment (Rammer et al., 2022). In this sense, our results move the discussion from whether AI matters for innovation to how different layers of AI matter jointly and conditionally across urban systems.

The significant impact of FTs and CTs highlights the ongoing importance of upstream capabilities in the AI era. Foundational infrastructures, such as computing capacity, data systems and network connectivity, reduce the marginal costs of experimentation and facilitate the production of cumulative knowledge, reflecting well-established ideas about digital infrastructure as a key enabler of long-term innovation. At the same time, CTs translate these infrastructural inputs into codified and recombinable knowledge, performing a role analogous to modular platform technologies that organise and accelerate innovation in complex systems. This dual mechanism is also consistent with recent evidence that different AI methods and use cases have heterogeneous innovation effects, with algorithm-intensive and data-rich deployments proving especially effective when embedded in robust infrastructures and complementary organisational capabilities (Rammer et al., 2022). Therefore, the superior performance of FTs and CTs in our models should be interpreted not as a diminishing role for applications, but as evidence that upstream and midstream capabilities constitute the enabling conditions for application-layer innovation to become durable, scalable and system-wide.

The interaction results provide further theoretical insight. The strong complementarity between FTs and CTs, together with the significant three-way FTs–CTs–GPTs interaction and the absence of robust two-way FTs–GPTs and CTs–GPTs effects, points to a hierarchically structured pattern of complementarity rather than simple pairwise synergies. Classical theory suggests that broad innovation effects emerge only when GPTs are combined with appropriate complementary technologies and organisational change (Bresnahan and Trajtenberg, 1995). Our results indicate that CTs function as a critical middle layer that activates the innovation potential of GPTs: without sufficient algorithmic and engineering depth, cities struggle to convert the co-presence of FTs and GPTs into high-quality innovation outcomes.

Similarly, FTs serve as the infrastructural substrate that enables both CTs and GPTs to scale and diffuse. The absence of strong FTs–GPTs and CTs–GPTs complementarities can be attributed to multiple interrelated constraints. Institutional limitations, such as fragmented governance structures, insufficient coordination across agencies and restricted access to key data, often prevent GPTs applications from evolving beyond isolated pilots. Market constraints arise because many GPTs deployments depend on external suppliers or short-term contracts, making it difficult to internalise knowledge locally and generate sustainable innovation outcomes. Moreover, organisational factors, including limited absorptive capacity and insufficient technical and managerial capabilities within firms and public institutions, further hinder the translation of GPTs applications into patents, products or transformative innovations. Consequently, GPTs’ innovation returns are highly conditional: only when FTs and CTs are mature, and when institutional, market and organisational environments are supportive, can GPTs be leveraged to achieve systemic, cross-sectoral urban innovation. This interpretation is consistent with the broader complementarity literature, which emphasises that superior performance arises from coherent systems of mutually reinforcing elements rather than isolated investments (Brynjolfsson and Milgrom, 2013). Therefore, these findings suggest that sustained AI-driven urban innovation requires coordinated alignment across foundational, core and application layers, rather than piecemeal development of any single tier. Accordingly, our evidence supports a layered complementarity view of AI, in which some complements are enabling and threshold-setting, while others are performance-enhancing only after those thresholds have been met.

The heterogeneity analysis further highlights the strong context dependence of these layered mechanisms. FTs and CTs exert particularly pronounced effects in coastal cities with favourable policy environments and high levels of connectivity, where dense innovation networks, deeper factor markets and more capable governance structures amplify the returns to infrastructural and algorithmic capacities. By contrast, GPTs generate robust innovation gains only where such complementary conditions are already in place. In less well-endowed or more peripheral cities, GPTs deployments risk remaining isolated pilot projects with limited systemic impact. This pattern challenges simplistic narratives of AI as a universally applicable smart city solution, instead showing that AI’s contribution to urban innovation depends on both vertical alignment within the technology chain and horizontal alignment with local institutional and network conditions. Overall, the findings support a view of AI-driven urban innovation as a layered, conditional and place-sensitive process in which the configuration of AI capabilities across tiers and territories is more informative than AI intensity alone. This also explains why cities with similar levels of AI investment may experience very different innovation outcomes: what matters is not only how much AI is deployed, but whether the relevant layers are properly sequenced, coordinated and embedded in supportive local environments.

To further contextualise these heterogeneous effects, we complement the group-based analysis with several representative city illustrations. These examples are not intended as formal case studies, but as interpretive extensions of the econometric findings that show how different configurations of foundational, core and application-layer AI capabilities shape urban innovation pathways in practice. In this sense, the heterogeneity observed across broad city categories can be understood not merely as statistical variation, but as reflecting differences in the internal alignment of AI-related technological layers under distinct local industrial, institutional and market conditions. This interpretive step is useful because it links abstract interaction effects to concrete urban development trajectories.

Shenzhen exemplifies a relatively mature and integrated pathway in which FTs, CTs and GPTs reinforce one another. Its strengths in electronic information manufacturing, communication equipment and intelligent hardware provide a strong FTs base, while dense innovation networks involving AI firms, software developers, research institutions and venture capital support CTs upgrading. These capacities are further linked to extensive GPTs applications in manufacturing, logistics, security, finance and urban governance. By contrast, Hangzhou represents a pathway more strongly driven by the interaction between CTs and GPTs. Supported by platform firms, cloud infrastructure, data ecosystems and digital services, the city has translated algorithmic and software capabilities into wide-ranging applications in e-commerce, fintech and intelligent governance. At the same time, this pattern also reveals a potential structural risk in our framework: when downstream application expansion outpaces upstream capability formation, cities may remain dependent on external supply in critical foundational segments. These two cities illustrate that successful AI-driven innovation may arise from different capability mixes, but long-term resilience still requires stronger coupling across all three layers.

A different trajectory is visible in Chengdu, which illustrates how an inland core city can develop a comparatively balanced AI innovation system. Drawing on its scientific and educational resources, electronics and information industries and expanding software sector, Chengdu has gradually built linkages across FTs, CTs and GPTs, with applications extending into transportation, manufacturing, public services and regional digital governance. In contrast, Guiyang helps illustrate the consequences of layer-specific imbalance. Although the city has developed visible strengths in big data infrastructure, data centres and computing-related facilities, these advantages have not been equally matched by a strong local ecosystem in advanced algorithms, frontier AI firms or broad innovation commercialisation. This case helps explain why leadership in one layer, particularly infrastructure, does not automatically generate stronger urban innovation without sufficient coupling to CTs and downstream applications. The contrast between Chengdu and Guiyang therefore reinforces our core argument that balance and coordination across layers matter more than isolated strengths in any single dimension.

In summary, these city illustrations reinforce the central implication of our empirical analysis: AI-driven urban innovation depends less on the scale of AI deployment alone than on the coordinated development of FTs, CTs and GPTs within a city’s broader industrial, institutional and market environment. They also clarify the practical meaning of upstream gaps and downstream overheating. Cities with strong application demand but weak foundational support may achieve rapid but less durable innovation gains, whereas cities with substantial infrastructure investment but underdeveloped core and application ecosystems may struggle to convert technological capacity into innovation performance. From this perspective, effective urban AI strategies should focus on cross-layer coordination rather than isolated technological advancement. This interpretation directly echoes the econometric evidence that the strongest additional innovation gains arise not from stand-alone GPTs expansion, but from the joint development of FTs, CTs and GPTs.

Firstly, this study contributes to the literature by demonstrating that FTs, CTs and GPTs are each important for sustainable urban innovation, yet fundamentally interdependent. This finding suggests that AI should be conceptualised not as a single general-purpose technology, but rather as a stack of partially specialised and interrelated technologies. While recent studies emphasise the diversity of AI methods, use cases and organisational configurations, AI is still often treated as an aggregate or generic digital capability (Mariani et al., 2023). In contrast, research on digital stacks and layered digital sovereignty highlights how infrastructures, platforms and applications jointly shape technological capacity at multiple levels (Sheikh, 2022). By integrating these perspectives, our results show that the internal architecture of AI itself, from upstream computing and data (FTs), through algorithms and models (CTs), to downstream cross-sectoral applications (GPTs), critically conditions its innovation effects at the urban scale. Future research could therefore move beyond binary “AI versus non-AI” distinctions and explicitly model the co-evolution of different AI layers within sectoral and regional innovation systems. One promising avenue would be to theorise what constitutes a “minimum viable AI stack” for sustained innovation across different types of cities and industries. This is an important shift because it reframes AI not as a monolithic shock, but as a structured technological system whose internal composition shapes its external economic consequences.

Secondly, the observed interaction patterns suggest refinements to complementarity and GPTs theories in the context of complex technological systems. Existing work typically treats AI as a new GPTs whose effects depend on complementary investments in skills, data and organisational change (Mariani and Dwivedi, 2024; Rammer et al., 2022), echoing classic arguments on co-invention and complementarities (Bresnahan and Trajtenberg, 1995). Our findings, however, indicate that these complementarities are themselves hierarchically organised. Robust FTs–CTs complementarities and the significant three-way FTs–CTs–GPTs interaction contrast with the absence of stable two-way GPT-related complementarities. This suggests that FTs and CTs may function as threshold complements, which must reach a sufficient level of maturity before GPTs can generate system-wide innovation effects. This interpretation aligns with broader work on organisational complementarities, which emphasises that high performance emerges from coherent configurations of mutually reinforcing elements rather than isolated practices (Brynjolfsson and Milgrom, 2013). Future research could build on this insight by distinguishing analytically between “activating” and “activated” complements, examining how threshold conditions evolve over time, particularly with the diffusion of generative technologies and tracing how cities and firms sequence investments across layers to move from fragmented pilots to integrated AI innovation trajectories (Banh and Strobel, 2023; Mariani and Dwivedi, 2024). In this respect, our study extends complementarity theory by showing that complementarities in AI are not merely additive, but ordered, conditional and nested within a layered technological structure.

Thirdly, the heterogeneous effects observed across coastal and inland cities, policy-rich and policy-poor regions, and cities of different sizes underscore the importance of place-sensitive theories of AI-enabled innovation. Prior research on urban AI and smart cities has shown that AI adoption is uneven and may reinforce existing socio-spatial inequalities (Yigitcanlar et al., 2020), with capabilities and applications disproportionately concentrated in a limited set of globally connected cities (Yue et al., 2025). Our results extend this literature by demonstrating that AI diffuses not as a single, neutral technology, but as a combination of FTs, CTs and GPTs capabilities whose effects depend on local infrastructures, institutions and network structures. This perspective suggests that AI may act as a magnifier of regional innovation advantages and disadvantages rather than as a simple equaliser. Future research could therefore explore how different governance regimes, planning traditions and policy mixes interact with layered AI architectures, and whether latecomer regions can strategically combine FTs, CTs and GPTs to leapfrog or whether they remain constrained by cumulative disadvantages along the entire technology chain. Thus, the study contributes to urban and regional innovation theory by demonstrating that the developmental consequences of AI are simultaneously technological and spatial.

The findings of this study offer several implications for urban governance and AI-enabled innovation strategies. Firstly, cities should avoid application-oriented smart-city strategies that prioritise visible AI projects without sufficient investment in upstream capabilities. Although downstream AI applications in transportation, health care, public services and urban management may generate short-term efficiency gains, our evidence suggests that such applications are more likely to produce sustained innovation benefits when supported by robust foundational infrastructure and core algorithmic capacities. Urban governments should therefore shift from project-based evaluation metrics towards capability-oriented governance frameworks that assess long-term readiness in computing infrastructure, data platforms, algorithmic development and absorptive capacity. This is especially important in light of our finding that standalone GPT-related complementarities are weak, whereas stronger gains emerge when application deployment is embedded in a more complete AI capability structure.

Secondly, the complementarity between FTs and core AI technologies highlights the need for coordinated governance mechanisms. In many cities, investments in digital infrastructure, industrial development and smart-city applications are managed by different departments, which may result in fragmented policy implementation and weak technological alignment. Establishing integrated governance structures, such as cross-sectoral digital transformation offices, municipal AI steering committees or joint public–private innovation platforms, can help align infrastructure deployment with algorithmic innovation and application development. From the perspective of our empirical results, the central policy challenge is therefore not simply increasing AI spending, but improving the coherence of AI-related investment across layers and administrative domains.

Thirdly, the heterogeneity results call for differentiated and place-based AI development pathways. Rather than adopting a “one-size-fits-all” approach, urban governments should align AI investment priorities with local technological endowments, institutional environments and innovation constraints. Based on the layered framework of FTs, CTs and general-purpose AI applications, we propose four representative policy pathways.

For application-led but foundation-weak cities, where smart-city applications and general-purpose AI projects are expanding rapidly but foundational and core technological capacities remain insufficient, the priority should be strengthening FTs and CTs. These cities should avoid substituting visible application projects for long-term capability building. Policy measures may include investment in computing infrastructure, data platforms, public research platforms, local algorithmic capacity, regional computing-resource sharing and joint laboratories involving universities, firms and public agencies.

For platform- and algorithm-driven digital cities, where platform ecosystems, software capabilities and AI applications are relatively advanced but foundational technological dependence remains a constraint, policy should focus on reinforcing FTs autonomy and bottom-layer technological support. These cities should promote computing infrastructure, basic software, hardware support, open-source ecosystems, data governance institutions and industry alliances, thereby converting platform and algorithmic advantages into more resilient innovation capabilities. Similarly, inland and non-coastal cities may need to prioritise the gradual strengthening of FTs and CTs before expecting large innovation returns from ambitious GPTs expansion. In practical terms, this means that urban AI policy should pay close attention to sequencing: foundational and core capabilities should not be treated as optional background conditions, but as strategic prerequisites for durable innovation outcomes.

For comprehensive leading cities, where foundational and core AI capabilities are relatively strong and general-purpose AI applications have greater potential for wide diffusion, the policy focus should shift from single-project deployment to system-level coordination and cross-sectoral integration. These cities should develop regulatory sandboxes, scenario-opening mechanisms, public-sector data-sharing rules and cross-departmental application testbeds. They should also promote the integration of AI with advanced manufacturing, health care, transportation, education and public services, while supporting SMEs and entrepreneurial ecosystems in adopting AI technologies.

For regional catch-up innovation hubs, where universities, research institutes or regional policy support provide a foundation for AI development but commercialisation and industrial transformation remain relatively weak, the priority should be strengthening technology transfer and innovation entrepreneurship mechanisms. These cities should promote the collaboration among industries, universities and research institutions, build pilot demonstration platforms and testing environments, improve technology-transfer services and provide targeted innovation finance for AI start-ups and SMEs.

Finally, urban AI governance should be embedded in a long-term and adaptive policy framework. AI strategies should not be evaluated only by the number of smart-city projects or the speed of application deployment. Instead, policymakers should pay greater attention to whether a city is accumulating the foundational, algorithmic and organisational capabilities required for sustained innovation. Effective AI-enabled urban transformation therefore requires local governments to act as system integrators rather than merely technology adopters, orchestrating interactions among infrastructure providers, algorithm developers, application firms, universities, financial institutions and public agencies. By adopting governance approaches that emphasise coordination, learning and adaptability, cities can better harness AI as a catalyst for sustainable urban innovation and entrepreneurial development. In short, the policy message of this paper is that cities should govern AI as a layered innovation system, not as a collection of isolated smart applications.

This study has several limitations that suggest avenues for further research. Firstly, the three-tier AI indicators are based on patent data, which tends to underrepresent tacit, service-oriented and organisational innovations. Furthermore, it captures the successful applications rather than the full spectrum of experimentation and failure. Future work could combine patent data with firm-level surveys, production and productivity data or digital trace data (e.g. cloud usage, AI recruitment and open-source activity) to gain a fuller understanding of AI capabilities and their non-patented manifestations. This would be particularly useful for capturing AI diffusion in services, platform ecosystems and public-sector applications, where innovation is often only partially reflected in patent outputs. Secondly, while our IV strategy mitigates some endogeneity concerns, it cannot fully rule out residual bias from unobserved policy shocks, informal institutions or parallel digital reforms. More granular designs, such as policy experiments, staggered roll-outs of AI-related programmes, or matched firm–city panels, would allow stronger causal identification and closer examination of sequencing and timing along the AI technology chain. Such approaches would also help clarify whether the complementarities identified here emerge simultaneously or unfold through dynamic cumulative processes. Thirdly, the focus on Chinese cities and on a single three-tier taxonomy limits external validity and conceptual generality. Comparative studies across developed economies and other emerging markets, as well as alternative decompositions of the AI stack (e.g. distinguishing between data governance, model services and sector-specific AI modules), would help to test the robustness of the layered perspective proposed here. Future comparative research could also examine whether the threshold role of FTs and CTs is universal or varies across institutional and industrial contexts. Fourthly, our macro urban perspective does not explicitly model spatial and networked interdependencies, even though AI capabilities and innovation clearly spill over across firm, sectoral and city boundaries. Future research could use spatial econometric models, multi-level network analysis and in-depth case studies to reveal the mechanisms at the micro level that underpin the technology coupling, knowledge flows and governance arrangements documented in this paper. This is especially relevant given our heterogeneity results, which suggest that connectivity and accessibility may alter the balance between local capability formation and inter-city diffusion. Finally, as the data set ends in 2021, it does not fully capture the commercialisation and diffusion of large language models and generative AI applications after 2022. These recent developments may further strengthen the strategic importance of FTs such as computing infrastructure, chips and cloud systems, as well as CTs related to model architectures and algorithmic frameworks. At the same time, they could accelerate the spread of application-layer AI across urban sectors. However, such diffusion is likely to depend on prior accumulation of upstream capabilities. Accordingly, the findings of this study are best understood as identifying the structural basis of AI-enabled urban innovation prior to the generative AI boom. Future research could extend the analysis to the post-2022 period to examine whether complementarities between FTs, CTs and GPTs are reinforced, delayed or reconfigured under the large-model paradigm. The rise of generative AI makes it particularly important to investigate whether application-layer diffusion is becoming easier while dependence on foundational and core capabilities simultaneously deepens.

The first author is responsible for the acquisition, analysis, and interpretation of the data; the second author is responsible for the conception and design of the work, while the third author is responsible for drafting and revising the work. These authors contribute equally to this work, and they are the co-first authors.

Aghion
,
P.
,
Antonin
,
C.
and
Bunel
,
S.
(
2021
),
The Power of Creative Destruction: Economic Upheaval and the Wealth of Nations
,
Harvard University Press
,
Cambridge, MA
.
Babina
,
T.
,
Fedyk
,
A.
,
He
,
A.
and
Hodson
,
J.
(
2024
), “
Artificial intelligence, firm growth, and product innovation
”,
Journal of Financial Economics
, Vol.
151
, p.
103745
.
Banh
,
L.
and
Strobel
,
G.
(
2023
), “
Generative artificial intelligence
”,
Electronic Markets
, Vol.
33
No.
1
, p.
63
.
Bibri
,
S.E.
,
Huang
,
J.
,
Jagatheesaperumal
,
S.K.
and
Krogstie
,
J.
(
2024
), “
The synergistic interplay of artificial intelligence and digital twin in environmentally planning sustainable smart cities: a comprehensive systematic review
”,
Environmental Science and Ecotechnology
, Vol.
20
, p.
100433
.
Boschma
,
R.
(
2017
), “
Relatedness as driver of regional diversification: a research agenda
”,
Regional Studies
, Vol.
51
No.
3
, pp.
351
-
364
.
Bresnahan
,
T.F.
and
Trajtenberg
,
M.
(
1995
), “
General purpose technologies ‘engines of growth’?
”,
Journal of Econometrics
, Vol.
65
No.
1
, pp.
83
-
108
.
Brusoni
,
S.
,
Henkel
,
J.
,
Jacobides
,
M.G.
,
Karim
,
S.
,
MacCormack
,
A.
,
Puranam
,
P.
and
Schilling
,
M.
(
2023
), “The power of modularity today: 20 years of ‘design ’rules”, in
Brusoni
,
S.
,
Henkel
,
J.
,
Jacobides
,
M.G.
,
Karim
,
S.
,
MacCormack
,
A.
,
Puranam
,
P.
and
Schilling
,
M.
(Eds),
The Power of Modularity Today: 20 Years of ‘Design Rules’
,
Oxford University Press
,
Oxford
, pp.
1
-
10
.
Brynjolfsson
,
E.
and
Milgrom
,
P.
(
2013
), “Complementarity in organizations”, in
Gibbons
,
R.
and
Roberts
,
J.
(Eds),
The Handbook of Organizational Economics
,
Princeton University Press
,
Princeton, NJ
, pp.
11
-
55
.
Carnabuci
,
G.
and
Operti
,
E.
(
2013
), “
Where do firms’ recombinant capabilities come from? Intraorganizational networks, knowledge, and firms’ ability to innovate through technological recombination
”,
Strategic Management Journal
, Vol.
34
No.
13
, pp.
1591
-
1613
.
Chen
,
K.
,
Zhou
,
X.
,
Bao
,
Z.
,
Skibniewski
,
M.J.
and
Fang
,
W.
(
2025
), “
Artificial intelligence in infrastructure construction: a critical review
”,
Frontiers of Engineering Management
, Vol.
12
No.
1
, pp.
24
-
38
.
Cohen
,
W.M.
and
Levinthal
,
D.A.
(
1990
), “
Absorptive capacity: a new perspective on learning and innovation
”,
Administrative Science Quarterly
, Vol.
35
No.
1
, pp.
128
-
152
.
Czarnitzki
,
D.
,
Fernández
,
G.P.
and
Rammer
,
C.
(
2023
), “
Artificial intelligence and firm-level productivity
”,
Journal of Economic Behavior and Organization
, Vol.
211
, pp.
188
-
205
.
Daimi
,
K.
,
Alsadoon
,
A.
and
Coelho
,
L.
(
2023
),
Cutting Edge Applications of Computational Intelligence Tools and Techniques
,
Springer
,
Cham
.
Ennen
,
E.
and
Richter
,
A.
(
2010
), “
The whole is more than the sum of its parts—or is it? A review of the empirical literature on complementarities in organizations
”,
Journal of Management
, Vol.
36
No.
1
, pp.
207
-
233
.
Gawer
,
A.
(
2014
), “
Bridging differing perspectives on technological platforms: toward an integrative framework
”,
Research Policy
, Vol.
43
No.
7
, pp.
1239
-
1249
.
Gregory
,
R.W.
,
Henfridsson
,
O.
,
Kaganer
,
E.
and
Kyriakou
,
H.
(
2021
), “
The role of artificial intelligence and data network effects for creating user value
”,
Academy of Management Review
, Vol.
46
No.
3
, pp.
534
-
551
.
Grillitsch
,
M.
,
Asheim
,
B.
and
Trippl
,
M.
(
2018
), “
Unrelated knowledge combinations: the unexplored potential for regional industrial path development
”,
Cambridge Journal of Regions, Economy and Society
, Vol.
11
No.
2
, pp.
257
-
274
.
Helpman
,
E.
(
1998
),
General Purpose Technologies and Economic Growth
,
MIT Press
,
Cambridge, MA
.
Huang
,
J.
,
Bibri
,
S.E.
and
Keel
,
P.
(
2025
), “
Generative spatial artificial intelligence for sustainable smart cities: a pioneering large flow model for urban digital twin
”,
Environmental Science and Ecotechnology
, Vol.
24
, p.
100526
.
Hussain
,
H.
,
Jun
,
W.
and
Radulescu
,
M.
(
2025
), “
Innovation performance in the digital divide context: nexus of digital infrastructure, digital innovation, and e-knowledge
”,
Journal of the Knowledge Economy
, Vol.
16
No.
1
, pp.
3772
-
3792
.
Li
,
C.
,
Wen
,
M.
,
Jiang
,
S.
and
Wang
,
H.
(
2024
), “
Assessing the effect of urban digital infrastructure on green innovation: mechanism identification and spatial-temporal characteristics
”,
Humanities and Social Sciences Communications
, Vol.
11
No.
1
, pp.
1
-
14
.
Lyytinen
,
K.
,
Yoo
,
Y.
and
Boland
,
R.J.
Jr
(
2016
), “
Digital product innovation within four classes of innovation networks
”,
Information Systems Journal
, Vol.
26
No.
1
, pp.
47
-
75
.
Mariani
,
M.
and
Dwivedi
,
Y.K.
(
2024
), “
Generative artificial intelligence in innovation management: a preview of future research developments
”,
Journal of Business Research
, Vol.
175
, p.
114542
.
Mariani
,
M.M.
,
Machado
,
I.
,
Magrelli
,
V.
and
Dwivedi
,
Y.K.
(
2023
), “
Artificial intelligence in innovation research: a systematic review, conceptual framework, and future research directions
”,
Technovation
, Vol.
122
, p.
102623
.
Mei
,
Y.
,
Xu
,
X.
and
Zhang
,
X.
(
2024
), “
Study on the urban digital transformation gyroscope model
”,
Asia Pacific Journal of Innovation and Entrepreneurship
, Vol.
18
No.
2
, pp.
156
-
171
.
Milgrom
,
P.
and
Roberts
,
J.
(
1990
), “
The economics of modern manufacturing: technology, strategy, and organization
”,
The American Economic Review
, Vol.
80
No.
3
, pp.
511
-
528
.
Miller
,
A.
(
2020
), “
Developing production technologies in the context of global technological challenges
”,
Нови Економист
, Vol.
13
No.
25
, pp.
14
-
21
.
Nambisan
,
S.
,
Wright
,
M.
and
Feldman
,
M.
(
2019
), “
The digital transformation of innovation and entrepreneurship: progress, challenges and key themes
”,
Research Policy
, Vol.
48
No.
8
, p.
103773
.
Nie
,
C.
,
Zhong
,
Z.
and
Feng
,
Y.
(
2023
), “
Can digital infrastructure induce urban green innovation? New insights from China
”,
Clean Technologies and Environmental Policy
, Vol.
25
No.
10
, pp.
3419
-
3436
.
Rammer
,
C.
,
Fernández
,
G.P.
and
Czarnitzki
,
D.
(
2022
), “
Artificial intelligence and industrial innovation: evidence from german firm-level data
”,
Research Policy
, Vol.
51
No.
7
, p.
104555
.
Schwaeke
,
J.
,
Peters
,
A.
,
Kanbach
,
D.K.
,
Kraus
,
S.
and
Jones
,
P.
(
2025
), “
The new normal: the status quo of AI adoption in SMEs
”,
Journal of Small Business Management
, Vol.
63
No.
3
, pp.
1297
-
1331
.
Sheikh
,
H.
(
2022
), “
European digital sovereignty: a layered approach
”,
Digital Society
, Vol.
1
No.
3
, p.
25
.
Son
,
T.H.
,
Weedon
,
Z.
,
Yigitcanlar
,
T.
,
Sanchez
,
T.
,
Corchado
,
J.M.
and
Mehmood
,
R.
(
2023
), “
Algorithmic urban planning for smart and sustainable development: systematic review of the literature
”,
Sustainable Cities and Society
, Vol.
94
, p.
104562
.
Song
,
M.
,
Yu
,
M.
,
Chen
,
X.-L.
,
Lobonț
,
O.-R.
and
Du
,
J.
(
2025
), “
Made in China 2025: artificial intelligence intervention and urban green economy development
”,
Journal of Environmental Management
, Vol.
391
, p.
126411
.
Teece
,
D.J.
(
2018
), “
Profiting from innovation in the digital economy: enabling technologies, standards, and licensing models in the wireless world
”,
Research Policy
, Vol.
47
No.
8
, pp.
1367
-
1387
.
Tilson
,
D.
,
Lyytinen
,
K.
and
Sørensen
,
C.
(
2010
), “
Research commentary—digital infrastructures: the missing is research agenda
”,
Information Systems Research
, Vol.
21
No.
4
, pp.
748
-
759
.
Ullah
,
I.
,
Adhikari
,
D.
,
Su
,
X.
,
Palmieri
,
F.
,
Wu
,
C.
and
Choi
,
C.
(
2025
), “
Integration of data science with the intelligent IoT (IIoT): current challenges and future perspectives
”,
Digital Communications and Networks
, Vol.
11
No.
2
, pp.
280
-
298
.
Von Hippel
,
E.
(
2006
),
Democratizing Innovation
,
MIT Press
,
Cambridge, MA
.
Wang
,
B.
,
Ma
,
M.
,
Zhang
,
Z.
and
Li
,
C.
(
2024
), “
How do the key capabilities of the industrial internet platform support its growth? A longitudinal case study based on the resource orchestration perspective
”,
Technological Forecasting and Social Change
, Vol.
200
, p.
123186
.
Wei
,
Y.
,
Meng
,
Z.
,
Liu
,
N.
and
Mao
,
J.
(
2025
), “
Research on the impact of hard technology innovation on the high-quality development of SRDI enterprises: based on the moderating role of digital transformation
”,
Asia Pacific Journal of Innovation and Entrepreneurship
, Vol.
19
No.
1
, \pp.
24
-
41
.
Wu
,
F.
,
Lu
,
C.
,
Zhu
,
M.
,
Chen
,
H.
,
Zhu
,
J.
,
Yu
,
K.
,
Li
,
L.
,
Li
,
M.
,
Chen
,
Q.
,
Li
,
X.
,
Cao
,
X.
,
Wang
,
Z.
,
Zha
,
Z.
,
Zhuang
,
Y.
and
Pan
,
Y.
(
2020
), “
Towards a new generation of artificial intelligence in China
”,
Nature Machine Intelligence
, Vol.
2
No.
6
, pp.
312
-
316
.
Yigitcanlar
,
T.
,
Desouza
,
K.C.
,
Butler
,
L.
and
Roozkhosh
,
F.
(
2020
), “
Contributions and risks of artificial intelligence (AI) in building smarter cities: insights from a systematic review of the literature
”,
Energies
, Vol.
13
No.
6
, p.
1473
.
Yoo
,
Y.
,
Boland
,
R.J.
, Jr
,
Lyytinen
,
K.
and
Majchrzak
,
A.
(
2012
), “
Organizing for innovation in the digitized world
”,
Organization Science
, Vol.
23
No.
5
, pp.
1398
-
1408
.
Yu
,
Z.
,
Liang
,
Z.
and
Xue
,
L.
(
2022
), “
A data-driven global innovation system approach and the rise of China’s artificial intelligence industry
”,
Regional Studies
, Vol.
56
No.
4
, pp.
619
-
629
.
Yue
,
Y.
,
Yan
,
G.
,
Lan
,
T.
,
Cao
,
R.
,
Gao
,
Q.
,
Gao
,
W.
,
Huang
,
B.
,
Huang
,
G.
,
Huang
,
Z.
and
Kan
,
Z.
(
2025
), “
Shaping future sustainable cities with AI-powered urban informatics: toward human-AI symbiosis
”,
Computational Urban Science
, Vol.
5
No.
1
, p.
31
.
Zeng
,
Y.
,
Zhang
,
Z.
,
Ye
,
Z.
and
Li
,
L.
(
2023
), “
Regional innovation effect of smart city construction in China
”,
PLoS One
, Vol.
18
No.
2
, p.
e0281862
.
Zhou
,
L.
,
Li
,
L.
and
Cheng
,
J.
(
2025
), “
The growth code of SMEs in the digital wave: how innovation, initiative, and management ability drive growth hacking capabilities
”,
Asia Pacific Journal of Innovation and Entrepreneurship
, Vol.
19
No.
4
, pp.
352
-
377
.
Zhu
,
G.
,
Wen
,
M.
,
Fan
,
X.
and
Zhou
,
M.
(
2020
), “
A case study on the mechanism of university-industry collaboration to improve enterprise technological capabilities from the perspective of capability structure
”,
Innovation and Development Policy
, Vol.
2
No.
2
, pp.
99
-
125
.

This  Appendix provides additional details on the identification, classification and validation of AI-related patents used to construct the three-tier technology variables in this study. Its purpose is to improve methodological transparency and reproducibility by documenting the process by which raw patent records are transformed into city-level measures of foundational, core and general-purpose technologies.

A1. Data sources and sample scope

Patent data are primarily obtained from the CNIPA and supplemented by records retrieved from the IncoPat (PatSnap) database. The raw patent data set includes patent titles, abstracts, application years, applicant locations and patent classification codes, including IPC and, where available, CPC codes. Patents are assigned to cities according to applicant location information and then aggregated to the city–year level. The patent sample covers the same period as the main analysis, namely, 2010–2021.

To ensure consistency with the empirical sample, the patent records are matched to the balanced panel of 120 Chinese cities used in the main analysis. During preprocessing, patents with missing location information, duplicate records or insufficient text information for classification are excluded.

A2. Construction of the artificial intelligence keyword dictionary

The identification of AI-related patents begins with the construction of a hierarchical AI keyword dictionary. The initial keyword pool is compiled from multiple authoritative policy and classification sources, including: the classification system for key digital technology patents published by CNIPA; the 14th Five-Year Plan for Digital Economy Development issued by the State Council; and WIPO’s Artificial Intelligence: Classification and Terminology framework.

These sources provide a structured basis for identifying AI-related technical domains and representative terminology. The initial keyword pool is then refined through a review of academic literature on AI technology measurement, digital innovation and patent-based technology classification. To improve coverage, the dictionary is expanded further using natural language processing methods, such as TF-IDF extraction and semantic-similarity analysis. These methods help to identify synonymous, closely related and emerging technical terms that may not be fully captured in policy-based taxonomies.

The resulting dictionary is organised hierarchically, with broad AI domains at the top level and more specific technical terms at lower levels. This hierarchical structure is used to support both patent identification and subsequent tier assignment.

A3. Patent identification and matching rules

A patent is identified as AI-related if its textual and/or classification code information matches the AI dictionary and its associated technical categories. To minimise the risk of omission and misclassification errors, the identification process uses a combined matching strategy rather than relying on a single text field. Specifically, matching is conducted using the following information: the patent title, the patent abstract and the IPC/CPC classification codes.

Text matching is based on the occurrence of AI-related keywords and their expanded variants in patent titles and abstracts. Classification code matching is used as an additional filter to improve precision, particularly for patents with broad, abbreviated or highly technical text descriptions. A patent is classified as AI-related if it satisfies the predefined matching criteria based on this combined text-code screening process.

Where patents contain only weak textual matches or involve highly general digital terms, classification codes and contextual descriptions are used to determine whether they should be retained. This combined strategy improves measurement accuracy compared to approaches based solely on keyword matching in titles or abstracts.

A4. Tier-specific classification criteria

After AI-related patents are identified, they are assigned to one of three technological tiers: FTs, CTs or GPTs. These tiers are defined based on the conceptual framework developed in the main text and translated into operational coding rules as follows.

FTs refer to AI-enabling infrastructure and technical foundations. This category includes technologies related to AI chips, processors, computing hardware, data storage, cloud and edge infrastructure, communication support systems and other enabling components that provide the physical or digital basis for AI development and deployment.

CTs refer to the algorithmic and engineering capabilities that directly support the design, training, optimisation and implementation of AI systems. This category includes machine-learning algorithms, neural-network architectures, model training methods, data-processing frameworks, technical software modules and related system-engineering components.

GPTs refer to downstream application-oriented AI technologies that embed AI capabilities into products, services or cross-sector solutions. This category includes intelligent manufacturing applications, smart health-care systems, AI-enabled finance and logistics tools, intelligent service platforms and other application-layer technologies that adapt AI to broad economic and social use cases.

For patents that involve more than one tier, classification is based on the patent’s dominant technological function rather than on simple keyword frequency. In such cases, the title, abstract and classification codes are jointly reviewed to determine whether the patent primarily reflects infrastructure support, core algorithmic capability or downstream application use.

A5. Treatment of ambiguous and overlapping cases

Some patents contain terms that may correspond to more than one tier or may describe technologies located at the boundary between digital infrastructure, algorithmic systems and applied intelligent solutions. To improve coding consistency, the following rules are applied.

Firstly, where a patent contains keywords from multiple categories, the classification is based on the main technological contribution indicated by the patent description. Secondly, when the technological contribution remains unclear after automated screening, the patent is flagged for manual review. Thirdly, broad digital technologies that do not specifically reflect AI functionality are excluded unless their text or classification-code information clearly indicates an AI-related purpose. This procedure helps to avoid the over-inclusion of general digital patents, while also reducing the risk of excluding AI patents with specialised technical wording.

A6. Manual validation and reliability check

To assess the reliability of the automated identification and classification procedure, a random sample of patents is manually reviewed. Particular attention is given to patents with ambiguous wording, overlapping technical categories or borderline relevance to AI. Two researchers independently examine the sampled patents based on the patent title, abstract and classification-code information. Discrepancies between the two reviewers are discussed and resolved through consensus, and the results of this manual review are used to refine the coding rules and improve the keyword dictionary where necessary. This process ensures that the final classification is not driven solely by automated text matching but also incorporates expert judgement in a systematic and replicable way.

A7. Aggregation to city-level variables

After validation and tier assignment, classified patents are aggregated to the city–year level for each of the three technological tiers. This produces three city-level AI patent variables corresponding to FTs, CTs and GPTs. In the main analysis, these variables are transformed using log(1 + x) to reduce skewness and then standardised to facilitate coefficient comparison across model specifications. The final city-level patent variables are used as the core explanatory variables in the empirical analysis reported in the main text.

A8. Illustrative keyword categories

Table A1 reports the main keyword categories used in the AI patent dictionary. Table A2 summarises the operational coding rules used to distinguish FTs, CTs and GPTs. Table A3 provides representative examples of patent terms or descriptions corresponding to each tier. These tables are intended to improve transparency and to facilitate future replication or extension of the classification framework.

Table A1.

Main keyword categories for AI patent identification

CategorySubcategoryRepresentative terms/examples
AI infrastructureChips and hardwareAI chip, accelerator, processor, GPU, edge device and intelligent sensor
Data and computing supportStorage and computingCloud computing, edge computing, distributed computing, storage system and computing platform
Algorithms and modelsMachine learningMachine learning, deep learning, supervised learning and reinforcement learning
Algorithms and modelsNeural architecturesNeural network, convolutional neural network, recurrent neural network and transformer
Technical frameworksAI engineeringModel training, parameter optimisation, feature extraction and inference engine
Intelligent applicationsManufacturing and industryIntelligent manufacturing, machine vision, predictive maintenance and industrial intelligence
Intelligent applicationsServices and platformsSmart health care, smart transportation, intelligent finance and recommendation system
Table A2.

Operational coding rules for FT, CT and GPT

TierDefinitionCoding ruleTypical examples
FTsAI-enabling infrastructure and foundational support technologiesPatent primarily relates to hardware, computing, storage, communication or data infrastructure supporting AI development/deploymentAI chips, computing devices, data platforms and cloud/edge infrastructure
CTsCore algorithmic and engineering capabilitiesPatent primarily relates to algorithms, model architectures, learning methods, training, optimisation or AI software frameworksMachine learning algorithms, neural networks and model training methods
GPTsDownstream application-oriented AI technologiesPatent primarily embeds AI into products, services or cross-sector solutionsSmart diagnosis, intelligent logistics and autonomous control systems
Table A3.

Examples of tier assignment

Patent description/termAssigned tierReason
AI accelerator chip for neural network computationFTsPrimarily reflects hardware infrastructure for AI computing
Deep-learning model optimisation methodCTsPrimarily reflects algorithmic and model-engineering capability
Intelligent medical image diagnosis systemGPTsPrimarily reflects downstream application of AI in health care
Edge-computing platform for real-time AI deploymentFTsFocuses on computing infrastructure enabling AI applications
Reinforcement learning-based scheduling methodCTsFocuses on AI method and optimisation logic
Smart logistics recommendation platformGPTsFocuses on applied intelligent solution in logistics
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at Link to the terms of the CC BY 4.0 licenceLink to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal