This systematic review examines how within-gender heterogeneity in board gender diversity, variation in female directors' observable and underlying attributes, relates to corporate sustainability outcomes, captured as ESG performance and ESG reporting.
Guided by stakeholder–agency theory and an adapted Milliken and Martins' (1996) framework, we systematically searched EBSCO Business Source Complete and Web of Science. We included 70 English-language, peer-reviewed archival studies that operationalise board gender diversity through multiple female-directors attributes and report associations with ESG performance and ESG reporting. We synthesise findings using vote counting and report simple summary statistics for the most frequently studied attributes.
Evidence is most consistent for female non-executive directors, who are robustly associated with stronger ESG performance across governance systems and regulatory settings. Results for female executive directors are more context-dependent and frequently mixed. For other heterogeneity dimensions, the evidence base remains comparatively thin and inconclusive, particularly for ESG reporting, reflecting both limited study volume and measurement heterogeneity.
Boards and regulators should treat gender diversity as a multi-attributed governance resource rather than a headcount metric. Emphasis on the composition and authority of female directors appears most consequential for sustainability performance, while disclosure-related effects require more consistent measurement and stronger identification.
This review's contribution lies in systematically operationalising and coding within-gender heterogeneity by separating observable from underlying attributes and mapping them distinctly to ESG performance versus ESG reporting, thereby clarifying where evidence is comparatively consistent and where key gaps remain.
1. Introduction
Corporate boards face growing pressure to demonstrate credible progress on sustainability, reflected in both improvements in environmental, social, and governance (ESG) performance and more transparent ESG reporting. In response, board gender diversity (BGD) has become a central construct in corporate governance and accounting research (Al-Shaer et al., 2025; Wu et al., 2022). This growing attention is also supported by meta-analytic evidence showing that BGD is positively associated with accounting-based financial performance (Post and Byron, 2015), alongside other firm outcomes, such as lower bank risk and improved stock liquidity (Gangi et al., 2025; Li et al., 2024).
Within the sustainability domain, higher female board representation has been linked to higher overall sustainability performance (Khatri, 2023), as well as to environmental factors such as stronger carbon disclosure, lower emissions, and greater environmental innovation (Ben-Amar et al., 2017; Kyaw et al., 2022; Nadeem et al., 2020). Moreover, a meta-analysis covering both financial reporting and ESG reporting suggests that the positive effects of BGD are more consistently evident for environmental and social disclosure than for financial and governance disclosure (AlJanadi, 2025). These associations are often explained through improved monitoring, ethical sensitivity, and enhanced stakeholder orientation (Alfi et al., 2025; Nguyen et al., 2020; Sultana et al., 2024). At the same time, recent systematic reviews and meta-analyses confirm an overall positive association between BGD and ESG outcomes while emphasising persistent inconsistencies across findings, settings, and measurement choices (Alfi et al., 2025; Fernández-Torres et al., 2024).
A plausible explanation for these inconsistencies is that much of the empirical literature still treats gender diversity as a singular dimension. This approach neglects the fact that female directors differ meaningfully in observable demographic attributes (e.g. age, nationality, ethnicity) and underlying characteristics (e.g. board role, independence, education, expertise, tenure, and organisational embeddedness), which may shape how and when BGD translate into sustainability outcomes (Amorelli and García-Sánchez, 2021; Bannò et al., 2023; Campopiano et al., 2023). Boards typically appoint directors based on multiple attributes rather than gender alone. Consequently, the sustainability relevance of BGD is likely to depend on the configuration of female directors' authority, incentives, and resources within the boardroom. For instance, female non-executive directors may influence ESG primarily through monitoring, whereas female executive directors may affect ESG through strategic authority and implementation (Bannò et al., 2023; Velte, 2017). Likewise, directors' expertise, education, or embeddedness through interlocks or family ties may expand access to ESG-relevant resources while simultaneously constraining independence and objectivity (Amorelli and García-Sánchez, 2021).
To address this gap, we focus on BGD heterogeneity, defined as within-gender variation among female directors captured by intersections between gender and additional board attributes. We draw on Milliken and Martins' (1996) distinction between observable and underlying attributes to structure the evidence base. Observable attributes include visible demographic factors such as age, nationality, and ethnicity, whereas underlying attributes include less visible characteristics such as skills, education, tenure, independence, executive versus non-executive roles, and organisational embeddedness. This classification is conceptually aligned with stakeholder-agency theory, which suggests that boards can mitigate conflicts between managers and heterogeneous stakeholders by improving monitoring and aligning managerial decisions with broader stakeholder interests (Hill and Jones, 1992). Under this lens, heterogeneity among female directors matters because it alters the channels through which directors can affect stakeholder alignment and ESG outcomes, including monitoring capacity, decision authority, and ESG-relevant knowledge (Byron and Post, 2016; Nguyen et al., 2020).
A further source of ambiguity in the literature concerns the outcomes space itself. ESG performance and ESG reporting are conceptually distinct outputs, yet are not always consistently separated in empirical work and reviews (Fernández-Torres et al., 2024). Conflating reporting with performance risks blurring disclosure practices with realised sustainability outcomes, particularly when studies rely on different proxies, databases, or aggregation methods. Accordingly, we maintain a strict separation between ESG reporting and ESG performance throughout our synthesis.
This review contributes by systematically operationalising and coding BGD heterogeneity along two dimensions, observable versus underlying attributes and ESG performance versus ESG reporting, thereby clarifying where evidence is comparatively robust and where it remains thin. The literature concentrates heavily on underlying heterogeneity, especially the distinction between female executive and non-executive directors, whereas observable heterogeneity dimensions such as age and nationality remain underexamined (Campopiano et al., 2023). In addition, methodological and contextual differences, including governance structures, quota regimes, and endogeneity concerns, plausibly contribute to divergent findings. These issues are increasingly salient as reform-based and quasi-natural experimental studies leverage exogenous changes in female representation to strengthen causal inference and broaden the outcomes set beyond disclosure to strategic decisions such as investment and innovation (Baik et al., 2024; Barroso et al., 2024; Wang et al., 2025; Xin et al., 2025).
Against this background, we address two research questions. First, which attributes of BGD heterogeneity are employed in archival research to investigate association with ESG performance and reporting? Second, what empirical evidence exists regarding the direction and consistency of these associations across contexts and research designs? To maintain comparability and align with our theoretical focus on board composition, we restrict attention to board-level studies that link female directors to at least one additional observable or underlying attribute, and we exclude measures that solely re-express female representation (e.g. Blau-type indices or critical mass measures) without intersection attributes. We also focus on ESG outputs rather than ESG inputs, as input-output relationships are not necessarily direct or comparable across settings.
The remainder of this paper is structured as follows. Chapter 2 develops the theoretical and research framework. Chapter 3 outlines the systematic review methodology. Chapter 4 synthesises findings by heterogeneity dimensions while distinguishing ESG performance from ESG reporting. Chapter 5 discusses contributions, implications, limitations, and directions for future research.
2. Theoretical and research framework
2.1 Stakeholder-agency theory and board gender diversity heterogeneity
This review draws on stakeholder-agency theory to explain why heterogeneity within BGD may shape ESG performance and reporting differently (Hill and Jones, 1992). Stakeholder-agency theory extends agency logic by recognising multiple principals and the risk that managers prioritise short-term or private benefits over longer-term stakeholder value (Freeman, 1984; Jensen and Meckling, 1976; Ross, 1973). Boards are therefore expected to reduce information asymmetries and align corporate decisions with broader stakeholder interests, including sustainability-related outcomes.
Within this framework, BGD is not expected to operate uniformly. Female directors differ in observable attributes (e.g. age, nationality) and underlying attributes (e.g. board role, independence, expertise, tenure, and embeddedness), which condition their authority, incentives, and resources (Amorelli and García-Sánchez, 2021). Consequently, the channels through which women can influence ESG outcomes vary across types of female directors rather than reflecting gender representation alone (Byron and Post, 2016).
A first implication concerns role and authority. Female executive directors may influence ESG through strategic decision-making and implementation, whereas female non-executive directors are positioned to affect ESG via monitoring and transparency (Bannò et al., 2023). This distinction is particularly relevant where ESG outcomes are vulnerable to managerial discretion or symbolic compliance, as credible oversight can constrain opportunism and strengthen stakeholder alignment (Hill and Jones, 1992).
A second implication concerns independence and embeddedness. Independence enhances monitoring credibility and reduces conflicts of interest, yet non-executive status does not automatically imply independence, especially in one-tier systems (Alfi et al., 2025). Conversely, embeddedness through interlocks or family affiliation may increase access to resources and legitimacy relevant for ESG, while potentially weakening objectivity and facilitating entrenchment. Stakeholder-agency theory therefore predicts that some heterogeneity attributes may have ambivalent effects, helping to explain the mixed empirical evidence.
A third implication concerns human capital. Variation in education, expertise, and tenure shapes boards' ability to identify ESG risks, evaluate trade-offs, and scrutinise disclosures, thus affecting both ESG performance and ESG reporting (Velte, 2017). Tenure, in particular, may generate nonlinear effects as firm-specific knowledge increases with time, while independence may erode (Campopiano et al., 2023).
Finally, observable attributes such as age and nationality can proxy differences in experience, institutional exposure, and information-processing capacity. These attributes may strengthen stakeholder responsiveness but can also introduce coordination costs, suggesting context-dependent effects that remain underexplored in archival research.
In summary, stakeholder-agency theory suggests that BGD can improve ESG outcomes, but the effects depend on heterogeneity among female directors. Differences in role, independence, human capital, and embeddedness shape the mechanisms through which female directors mitigate stakeholder-agency conflicts and influence ESG performance and reporting (Hill and Jones, 1992).
2.2 Research framework
Figure 1 summarises our research framework, which adapts Milliken and Martins' (1996) distinction between observable and underlying diversity attributes in the context of BGD heterogeneity and ESG outcomes. Observable attributes refer to visible demographic characteristics of female directors, such as age, nationality, and ethnicity. Underlying attributes capture less visible characteristics that shape directors' authority, incentives, and resources, including executive versus non-executive role, independence, education, expertise, tenure, and organisational embeddedness (Milliken and Martins, 1996).
A diagram representing the research framework of stakeholder-agency theory. The diagram is divided into two main sections: Multiple board gender diversity research and Sustainability outputs. The Multiple board gender diversity research section is further divided into Observable female board attributes and Underlying female board attributes. Observable female board attributes include Age and Nationality. Underlying female board attributes include (Non-) executive directors, Education, Expertise, (Family) interlinks, and Tenure. Arrows from these attributes point towards Sustainability outputs, which are divided into ESG performance and ESG reporting. ESG performance includes Environmental performance, Social performance, and Governance performance. ESG reporting includes Environmental reporting, Social reporting, and Governance reporting.Research framework
A diagram representing the research framework of stakeholder-agency theory. The diagram is divided into two main sections: Multiple board gender diversity research and Sustainability outputs. The Multiple board gender diversity research section is further divided into Observable female board attributes and Underlying female board attributes. Observable female board attributes include Age and Nationality. Underlying female board attributes include (Non-) executive directors, Education, Expertise, (Family) interlinks, and Tenure. Arrows from these attributes point towards Sustainability outputs, which are divided into ESG performance and ESG reporting. ESG performance includes Environmental performance, Social performance, and Governance performance. ESG reporting includes Environmental reporting, Social reporting, and Governance reporting.Research framework
We distinguish two ESG output dimensions. ESG reporting refers to standalone disclosure on material ESG issues and explicitly excludes integrated reporting and assurance to maintain conceptual clarity (Alfi et al., 2025). ESG performance refers to third-party ESG scores or ratings that aggregate ESG indicators and typically rely, at least partly, on disclosed information (Fernández-Torres et al., 2024). Maintaining this separation is important because reporting and performance proxies differ in measurement quality and comparability across studies and data providers (Berg et al., 2022).
Building on stakeholder–agency theory, the framework proposes that female directors may enhance ESG outcomes by strengthening monitoring, stakeholder responsiveness, and transparency, thereby reducing stakeholder-agency conflicts (Hill and Jones, 1992; Nguyen et al., 2020). However, these effects are expected to vary with female directors' intersecting attributes (Byron and Post, 2016; Velte, 2017). Accordingly, we use the observable/underlying distinction to organise how the literature operationalises BGD heterogeneity and to structure the synthesis across ESG reporting and performance.
For observable attributes, prior research has mainly focused on age and nationality (Amorelli and García-Sánchez, 2021). Age may proxy experience and risk orientation, potentially strengthening oversight and long-term ESG commitment, but it may reflect lower openness to change and innovation, implying ambiguous effects. Nationality diversity may bring exposure to different regulatory regimes and ESG norms, potentially strengthening stakeholder orientation and disclosure practices, yet it can also introduce coordination costs and reduce board cohesion, which may weaken monitoring effectiveness (Bannò et al., 2023; Campopiano et al., 2023).
For underlying attributes, role and independence are central. Executive roles imply greater decision authority and implementation capacity, while non-executive roles are primarily associated with monitoring. Importantly, non-executive status does not necessarily imply independence, particularly in one-tier systems; independence is therefore treated as a more stringent condition that may strengthen monitoring credibility (Velte, 2017). Human capital attributes, education and expertise, are expected to enhance directors' capacity to evaluate ESG risks, challenge managerial narratives, and oversee disclosure quality, but operationalisations differ across studies, limiting comparability (Campopiano et al., 2023). Social capital and embeddedness, captured through interlocks and family affiliations, may provide access to resources and legitimacy relevant to ESG while also creating conflicts of interest that impair objectivity, implying potentially non-linear or context-dependent effects. Tenure similarly reflects a trade-off between firm-specific knowledge and declining independence, suggesting possible non-linear relationships with ESG outcomes (Fernández-Torres et al., 2024).
Finally, the framework highlights that governance systems and regulatory environments condition how heterogeneity attributes translate into ESG outcomes. One-tier boards combine executive and monitoring functions within a single board, whereas two-tier systems separate management and supervisory boards, shaping the meaning of “executive”, “non-executive”, and “independent” across contexts (Velte, 2017). Similarly, mandatory gender quota regimes may affect not only the number of female directors but also their role allocation and selection criteria, which can shift the composition of observable and underlying attributes and thereby influence ESG outcomes. These contextual factors provide an organising lens for interpreting cross-study heterogeneity in results.
Overall, Figure 1 structures the review by linking BGD heterogeneity to ESG outcomes under stakeholder-agency theory. This framework guides our coding scheme and synthesis, supporting the identification of where evidence is comparatively robust and where important gaps remain.
3. Methodology
We follow established guidance for structured and systematic reviews, including PRISMA 2020 (Page et al., 2021; Tranfield et al., 2003), to enhance transparency and reproducibility [1]. Research on BGD and ESG outcomes spans diverse datasets, contexts, and empirical specifications, and it operationalises both BGD and ESG in multiple, partly non-comparable ways (Wu et al., 2022). While meta-analyses are prevalent in this research area and well-suited to estimating an average effect size for a narrowly defined relationship and to testing moderator variables under relatively comparable operationalisations (Alfi et al., 2025; AlJanadi, 2025; Byron and Post, 2016; Wu et al., 2022), our objective differs. We focus on within-gender heterogeneity in BGD, that is, intersections between female directors and additional board attributes (e.g. executive versus non-executive role, independence, education, expertise). These heterogeneity attributes are measured in conceptually related but empirically heterogeneous ways across studies, which limits the feasibility of a single, pooled effect-size meta-analysis. Accordingly, we adopt a systematic review with structured narrative synthesis, which allows us to (1) map how BGD heterogeneity is operationalised, (2) distinguish outcomes consistently into ESG performance versus reporting, and (3) identify where evidence is comparatively robust versus thin, including methodological limitations that inform future research agendas.
We systematically searched two databases, EBSCO Business Source Complete and Web of Science. Searches were conducted in May 2025 using database-specific field settings. We searched titles, abstracts, and keywords. The complete Boolean search strings for each database, the fields searched, and all search dates are reported in Appendix A.
The search combined BGD terms (“board gender diversity”, “female directors”, “women on boards”) with sustainability-outcome terms (“ESG performance”, “ESG reporting”, “ESG disclosure”, “ESG”, “CSR”, “CSR performance”, “CSR reporting”, “CSR disclosure”, “sustainability”, “sustainability performance”, “sustainability reporting” and related terms). The initial search identified 532 records from EBSCO and 424 from the Web of Science. Duplicates were removed using DOI and title matching, resulting in the removal of 421 duplicate records. We excluded 118 records that did not examine the relationship between BGD and ESG outcomes.
This review examines associations between BGD heterogeneity and ESG outputs and, therefore, restricts the sample to English-language, peer-reviewed, archival quantitative studies published in double-blind reviewed international journals. We did not impose a temporal restriction. We excluded 29 non-empirical and qualitative studies and 62 studies that did not meet the language and outlet criteria.
To operationalise BGD heterogeneity consistently, we included only studies that measure BGD through multiple female-director attributes, meaning that female directors are linked to at least one additional observable or underlying attribute (Figure 1). Observable attributes include demographic characteristics such as age and nationality, while underlying attributes include executive versus non-executive role, independence, education, expertise, tenure, and organisational embeddedness. We excluded studies that only re-express female representation (e.g. critical mass or Blau-type indices) without intersecting attributes, because such measures do not capture within-gender heterogeneity as defined here. We also excluded studies focussing on individual-level roles outside the board (e.g. CEO gender), diversity at other organisational levels (e.g. committees, management teams, workforce), and outcomes outside ESG outputs. In particular, we do not include ESG inputs (e.g. CSR spending) or innovation inputs (e.g. R&D expenditure), because these inputs do not directly capture ESG performance or ESG reporting and are not necessarily comparable across settings.
After applying these criteria, we excluded 256 studies that did not operationalise BGD heterogeneity as defined above, leaving a final sample of 70 studies. Figure 2 reports the PRISMA flow diagram with PRISMA 2020 heading and corresponding counts.
Flowchart illustrating the stages of sample selection process. Records identified from EBSCO and Web of Science. Duplicate records removed. Records screened, excluded, sought for retrieval, assessed for eligibility, and included in review.Sample selection
Flowchart illustrating the stages of sample selection process. Records identified from EBSCO and Web of Science. Duplicate records removed. Records screened, excluded, sought for retrieval, assessed for eligibility, and included in review.Sample selection
We acknowledge that restricting the review to peer-reviewed archival studies may introduce selection concerns, including publication bias and exclusion of “grey literature” (e.g. working papers). To assess whether this restriction materially affects inference, we screened excluded records close to eligibility and compared their broad directional patterns with included studies; we did not identify systematic differences at this level.
Screening proceeded in three stages: title screening, abstract screening, and full-text screening. One reviewer conducted all screening and eligibility decisions. To enhance procedural transparency, we used a pre-specified codebook that maps each included study to (1) the heterogeneity attribute(s) examined (observable versus underlying) and (2) the ESG output dimension (ESG performance versus ESG reporting). An overview of all included studies is available in Appendix B.
Because the evidence base consists of archival studies, we assessed study quality and risk-of-bias using a structured appraisal adapted to corporate governance research. The appraisal considers the credibility of identification (e.g. whether the study addresses endogeneity), the validity and transparency of ESG proxy construction, sample selection, model specification and controls, and reporting clarity. We use these assessments to qualify confidence in patterns and to interpret whether more consistent evidence clusters in studies with stronger identification strategies.
We synthesised findings using vote-counting (Hedges and Olkin, 1980), coding reported associations for each heterogeneity attribute-outcome pairing as positive, negative, or statistically insignificant. We recorded reported significance thresholds and whether studies implemented endogeneity checks. We recognise the limitations of vote counting, including its reliance on statistical significance rather than effect magnitude and precision (Light and Smith, 1971). To partially mitigate this limitation, we included the significance level of each study and recognised whether endogeneity checks were included.
Table 1 summarises the included studies by publication year (Panel A), region (Panel B), journal outlet (Panel C), ESG outcome type (Panel D), heterogeneity attribute type (Panel E), and whether endogeneity checks are applied (Panel F). The topic remains relatively young: archival studies began in 2013 and peaked in 2025 (2024) with 14 (16) studies. Country-specific evidence is concentrated in China (14) and the United States (11), alongside cross-country designs (18). To improve interpretability across settings, we categorise governance contexts by one-tier versus two-tier systems and by mandatory versus voluntary gender quota regimes. Most country-specific studies focus on one-tier systems (29) rather than two-tier systems (19), and voluntary settings (36) rather than mandatory quota regimes (16). ESG performance is more frequently examined (44 studies) than ESG reporting (27), which may reflect the availability of data and measurement constraints.
Count of cited published papers
| Panel A: By publication year | |
| Total 70 |
|
| Panel B: by region | |
| Total: 70 | Cross-Country: 18 Corporate governance systems in country-specific studies (52)
|
| Panel C: by journal | |
| Total 70 | Accounting, Finance and Corporate Governance Journals (23)
|
| Panel D: by ESG variable | |
| Total: 71a |
|
| Panel E: by multiple BGD variables | |
| Total: 82a | Diversity in observable attributes (8)
|
| Panel F: by inclusion of endogeneity checks | |
| Total: 70 |
|
| Panel A: By publication year | |
| Total | 2025: 14 2024: 16 2023: 12 2022: 5 2021: 5 2020: 3 2019: 8 2018: 2 2016: 4 2013: 1 |
| Panel B: by region | |
| Total: 70 | Cross-Country: 18 One-tier system (29): Bangladesh (1), India (2), Pakistan (1), Spain (8), Turkey (2), UK (4), USA (11) Two-tier system (19): China (14), Germany (1), Indonesia (1), Poland (2), South Africa (1) Voting right (4): France (1), Italy (3) Mandatory (16): France (1), Germany (1), India (2), Italy (3), Pakistan (1), Spain (8) Voluntary (36): Bangladesh (1), China (14), Indonesia (1), Poland (2), South Africa (1), Turkey (2), UK (4), USA (11) |
| Panel C: by journal | |
| Total | Accounting and Finance: 2 Accounting Forum: 1 Applied Economics: 1 Applied Economics Letters: 1 China Accounting and Finance Review: 1 Corporate Governance: 3 Emerging Markets, Finance and Trade: 1 International Journal of Finance and Economics: 1 International Journal of Financial Analysis: 2 International Review of Economics and Finance: 1 Journal of Accounting in Emerging Economics: 1 Journal of Accounting Literature: 1 Journal of Contemporary Accounting and Economics: 1 Journal of Corporate Accounting and Finance: 1 Journal of Corporate Finance: 2 Kybernetes: 1 Meditari Accountancy Research: 1 The British Accounting Review: 1 Business Ethics: 1 Business Ethics, the Environment and Responsibility: 2 Business Strategy and the Environment: 7 Cogent Business and Management: 1 Corporate Social Responsibility and Environmental Management: 9 Entrepreneurship Theory and Practice: 1 European Management Review: 1 Human Relations: 1 International Journal of Energy Sector Management: 1 International Journal of Productivity and Performance Management: 1 Journal of Business Ethics: 2 Journal of Business Research: 1 Journal of Cleaner Production: 2 Journal of Environmental Management: 1 Journal of Global Responsibility: 2 Journal of Innovation and Knowledge: 1 Management Research Review: 1 Review of Managerial Science: 2 Social Responsibility Journal: 3 Sustainable Development: 2 Sustainability Accounting, Management and Policy Journal: 2 Technological Forecasting and Social Change: 2 |
| Panel D: by ESG variable | |
| Total: 71 | ESG performance: 44 ESG reporting: 27 |
| Panel E: by multiple BGD variables | |
| Total: 82 | Age: 6 Nationality: 2 Non-executive directors: 31 Executive directors: 34 Education: 8 Expertise: 6 (Family) interlinks: 8 Tenure: 2 |
| Panel F: by inclusion of endogeneity checks | |
| Total: 70 | Yes: 40 No: 30 |
Some studies include more than one variable
Prior studies have been published in various journals, whereas we differentiate between accounting, finance, and corporate governance (23 studies) and management and sustainability journals (47). The most published outlets are Corporate Social Responsibility and Environmental Management (9 studies) and Business Strategy and the Environment (7). As the impact of BGD on ESG outcomes represents the most important topic in sustainable corporate governance research, the focus on management and sustainability journals may be explained.
Given plausible endogeneity concerns in the BGD-ESG literature, we include studies using a range of econometric approaches, including OLS, panel models, Tobit, logit/probit, propensity score matching, 2SLS/IV, and difference-in-differences. Many studies apply at least one endogeneity check (Panel F). We retain earlier studies without such checks to avoid an overly restrictive sample that would understate the development of the literature, but we interpret patterns with appropriate caution.
4. Findings
4.1 BGD heterogeneity based on diversity in observable attributes
4.1.1 Female board age and female board nationality
Although BGD is the most frequently examined observable demographic attribute, archival research has largely neglected the intersection of female directors' gender with other observable characteristics such as age, nationality or ethnic background. We identified only eight studies in this area, so any inferences remain tentative.
Evidence on female director age suggests some positive associations with ESG performance in international settings (Haque et al., 2024) and in the Chinese two-tier system with voluntary gender quotas (Amadi et al., 2023; Elmagrhi et al., 2019; Jia and Zhang, 2013; Yang et al., 2019). With one exception (Elmagrhi et al., 2019), these studies do not report endogeneity tests, and reported effects typically reach conventional thresholds (≤0.05).
Colakoglu et al. (2021) find no significant association between female director age and ESG reporting. Female foreign directors are even less frequently studied. Two studies report a positive impact, one for ESG reporting (Dobija et al., 2023) and one for ESG performance (Hussain et al., 2025), both in two-tier systems with voluntary gender quotas and with endogeneity tests and significance levels of 0.05 and 0.01.
Overall, stakeholder agency theory suggests an ambiguous relationship between these observable attributes and ESG outcomes because age and nationality may proxy both monitoring-relevant resources and potential coordination costs. Given the small evidence base, we treat these patterns as suggestive rather than conclusive. Where positive links emerge, they are consistent with the idea that higher average age among female directors may reflect greater experience, and that foreign female directors may strengthen monitoring and stakeholder responsiveness through international exposure and outsider status.
4.2 BGD heterogeneity based on diversity in underlying attributes
Most studies in our sample focus on underlying heterogeneity attributes, particularly female non-executive directors (31 studies) and executive directors (34 studies). In this literature, board independence is often used as a proxy for non-executive status, reflecting the assumed monitoring role of non-executives and formal independence requirements in many one-tier settings. However, equating “non-executive” with “independent” is problematic, because impaired independence remains a central concern for non-executives, especially in one-tier boards. Accordingly, we treat independence as a stricter condition that overlaps with, but is not equivalent to, non-executive status.
4.2.1 Female non-executive directors and ESG performance
Across studies, evidence is most consistent for female non-executive directors and ESG performance. We observe clear tendencies towards a positive association (15 studies), with comparatively few insignificant results (5). Positive associations are reported in international samples (Agustina and Barokah, 2024; Atif et al., 2021; Fernández-Méndez et al., 2025; García-Sánchez et al., 2023; Gull et al., 2025), in one-tier systems (Au et al., 2023; Biswas et al., 2023; Kaur et al., 2025; Konadu et al., 2022; Liu, 2018; Tunyi et al., 2023) and in two-tier settings (Liao et al., 2019). Positive results also appear in both mandatory quota regimes (e.g. Kaur et al., 2025) and voluntary regimes (e.g. Liu, 2018). Several studies operationalise heterogeneity through female independent directors (Kaur et al., 2025). Where reported, studies typically use conventional significance thresholds (0.05 and 0.01) and frequently apply endogeneity checks.
No study reports a negative association between female non-executives and ESG performance. Insignificant findings occur in some voluntary regimes (Al-Shaer et al., 2024; Chu, 2024; Yang et al., 2019) and some mandatory regimes (Fleitas-Castillo et al., 2024; Cambrea et al., 2023). Overall, this pattern aligns with stakeholder-agency theory: where women occupy non-executive roles, heterogeneity in board composition may strengthen monitoring and reduce stakeholder-agency conflicts, thereby supporting stronger ESG performance.
4.2.2 Female non-executive directors and ESG reporting
Nine studies report a positive association between female non-executive directors and ESG reporting across both mandatory (Cabeza-García et al., 2018; Martínez et al., 2020; Rahman et al., 2024; Singhania et al., 2024) and voluntary settings (Al-Shaer and Zaman, 2016; Atif et al., 2020; Benjamin et al., 2020; Liu et al., 2023), spanning one-tier and two-tier systems. With one exception (Singhania et al., 2024), reported significance typically falls at 0.05 and 0.01, although endogeneity checks were less consistently applied (e.g. Khidmat et al., 2022). One study reports a negative association in the Indonesian two-tier system (Taufik and Oh, 2023). Other studies find insignificant results in voluntary regimes (Furtuna et al., 2024; Matuszak et al., 2019), and in mandatory contexts (Martínez et al., 2019).
Compared with ESG performance, the reporting evidence is less consistent, plausibly reflecting greater measurement heterogeneity in ESG reporting proxies and weaker comparability across settings. This also reinforces the need to maintain a strict distinction between reporting and performance outcomes in synthesis.
4.2.3 Female executive directors and ESG performance
For female executive directors, the evidence is more context-dependent. Eighteen studies report positive associations across international samples (Agustina and Barokah, 2024; Atif et al., 2020; Tahat and Hassanein, 2024; García-Meca and Martinez-Ferrero, 2025; Peng et al., 2025), one-tier systems (Al-Shaer et al., 2024; Issa and In'airat, 2025; Konadu et al., 2022), two-tier systems (Chu, 2024; Velte, 2016; Wu et al., 2024), and the French setting with mixed governance features (Paolone et al., 2025). Most studies report conventional significance thresholds (0.05/0.01) and frequently apply endogeneity checks (Peng et al., 2025). In contrast, a small number of studies report negative associations (Abdelkader et al., 2024; García-Sánchez et al., 2023) and eleven studies report insignificant findings (Biswas et al., 2023; Cambrea et al., 2023; Fernández-Méndez et al., 2025).
A plausible explanation for these mixed results is that quota regimes and appointment practices often expand female representation primarily in non-executive seats, while female representation at the executive level remains comparatively limited. As a result, estimated effects for female executives may be more sensitive to setting-specific conditions (e.g. authority distribution, internal role assignment, or the institutional salience of ESG), and to differences in how ESG performance is operationalised.
4.2.4 Female executive directors and ESG reporting
Only a small set of studies directly examines female executives and ESG reporting and results are mixed. Five studies report positive associations (Córdova Román et al., 2021; Nepal et al., 2025), with significant levels from 0.10 to 0.01 and only partial use of endogeneity checks. One study reports a negative association (Aladwan et al., 2025). Other studies find non-linear (Meng and Zhu, 2024) or insignificant effects (Atif et al., 2020; Córdova Román et al., 2021). Overall, this limited and heterogeneous evidence is consistent with two interpretation challenges: (1) lower comparability of ESG reporting proxies across studies, and (2) limited clarity in some studies regarding the precise executive role(s) held by female directors and their proximity to ESG disclosure processes.
4.2.5 Female board education
Only seven studies examine female directors' education. Two studies report positive associations with ESG performance (Haque et al., 2024; Amadi et al., 2023), both without endogeneity tests; two studies in China report insignificant results (Elmagrhi et al., 2019; Yang et al., 2019). For ESG reporting, two studies report positive associations in one-tier systems with mandatory gender quotas (Sánchez‐Hernández et al., 2025; Rahman et al., 2024), whereas Colakoglu et al. (2021) reports insignificant results in Turkey. Overall, the evidence remains inconclusive, and limited comparability of education proxies likely contributes to divergence across studies.
4.2.6 Female board expertise
Evidence on female directors' expertise is limited and heterogeneous (six studies). One Spanish study in a one-tier mandatory quota setting reports a positive association between female business expertise and ESG performance (García-Meca et al., 2024), while other studies find insignificant effects (Haque et al., 2024; Ren et al., 2024). For ESG reporting, Abd et al. (2025) reports a positive association in a cross-country setting, but Spanish studies show that results depend on how expertise is defined and disaggregated (Sánchez‐Hernández et al., 2025; Ramon-Llorens et al., 2021). Across these studies, the direction of effects varies across expertise categories (e.g. advisor roles versus community influence versus industry expertise), underscoring that “expertise” is not a uniform construct and that operational differences reduce cross-study comparability.
4.2.7 Female board (family) interlinks
Research on female directors' interlocks and family affiliation is underrepresented and yields mixed results. Yang et al. (2019) reports a weak positive association between female directors' other board memberships and ESG performance in China (0.10 significance level, without endogeneity tests). US evidence indicates positive associations between female directors' connections and specific sustainability-related outcomes (Glass et al., 2016; Cook and Glass, 2016). Five studies examine female family affiliations. Findings vary across family versus non-family status, executive versus non-executive role, and interaction patterns (e.g. tenure and interlocks), with evidence of both positive and negative associations depending on configuration (Cambrea et al., 2024; Cruz et al., 2019; Gavana et al., 2023). For ESG reporting, evidence suggests that affiliated female directors may be negatively associated with CSR reporting in some contexts (Biswas et al., 2022), while unaffiliated female directors are more consistently positively associated with CSR reporting (Biswas et al., 2022; Campopiano et al., 2019). Overall, this pattern is consistent with a trade-off highlighted by stakeholder-agency theory; interlocks and family ties may provide resources and legitimacy, but can also introduce conflicts of interest or “business” that weakens monitoring.
4.2.8 Female board tenure
Female board tenure is rarely examined. Rahman et al. (2024) reports a positive association between female independent director tenure and ESG reporting in Pakistan (0.05 significance level). Fleitas-Castillo et al. (2024) report a non-linear inverted U-shaped relationship between female director tenure (below versus above three years) and CSR (ir)responsibility in Spain. These findings are consistent with a learning-versus-capture interpretation: firm-specific knowledge may strengthen monitoring up to a point, while extended tenure may erode independence and reduce effectiveness.
4.3 Summary of the results
Overall, the evidence base is highly unbalanced across heterogeneity dimensions. Observable attributes (e.g. age and nationality) remain underexamined relative to underlying attributes, and this asymmetry constitutes a substantive gap in the literature.
Within underlying attributes, the most robust pattern concerns female non-executive directors and ESG performance, where evidence is consistently positive across governance systems (one-tier and two-tier) and across both mandatory and voluntary quota settings. Evidence on ESG reporting is less consistent, even for non-executive directors, reflecting both fewer studies and greater heterogeneity in ESG reporting practices.
For female executive directors, results for ESG performance and reporting are more mixed and appear more sensitive to contextual and measurement differences, as well as to the limited representation and variable role assignment of women at the executive level. For other attributes (education, expertise, interlocks/family affiliation, and tenure), the evidence remains comparatively thin and inconclusive, limiting the strength of any general conclusions.
Table 2 summarises the (in)significant results of the included studies.
Summary of results of the literature review
| Multiple BGD characteristics | ESG performance | ESG reporting | Total | |
|---|---|---|---|---|
| Diversity in observable attributes | ||||
| Age | (+) | 5 (+) | ||
| Nationality | (+) | 2 (+) | ||
| Diversity in underlying attributes | ||||
| Non- executives | (+) | 24+ (6+ independence; 14***; 8**; 2*; 14e) | ||
| (+) | 23 (+; 4*; 5**; 14***; 16e) | |||
| Education | (+) | 4 (+; 3**; 1***) | ||
| Expertise | (+) | 4 (+; 1***; 2**; 1*; 3e) | ||
| (Family) interlinks | (+) | 4 (+; 2***; 1**; 1*) | ||
| Tenure | (+) | 1 (+**) | ||
Note(s): *: significance level 0.1; **: significance level 0.05; ***: significance level 0.01; e: endogeneity tests included (such as 2SLS/IV, PSM; difference-in-difference approach)
5. Discussion and conclusion
5.1 Summary
Corporate boards increasingly face stakeholder demands to demonstrate credible sustainability commitments through both improved ESG performance and more transparent ESG reporting (Alfi et al., 2025; Byron and Post, 2016; Wu et al., 2022). In this context, BGD has become a prominent construct in corporate governance and accounting research and is frequently associated with stronger sustainability outcomes (AlJanadi, 2025; Khatri, 2023; Velte, 2017). At the same time, prior evidence remains inconsistent across studies, settings, and measurement choices, suggesting that treating female directors as a homogeneous group may mask meaningful within-gender variation (Amorelli and García-Sánchez, 2021; Bannò et al., 2023).
Addressing this gap, our systematic literature review examines BGD heterogeneity, defined as within-gender variation among female directors captured by intersections between gender and additional observable and underlying attributes. Synthesising evidence from 70 archival studies, we show that the literature concentrates heavily on underlying heterogeneity, especially women's board role (executive versus non-executive) and, in many studies, independence, whereas observable heterogeneity (e.g. age, nationality) remains comparatively underexamined. This asymmetry is itself a central insight: the field has developed a relatively mature evidence base around authority- and monitoring-related heterogeneity, but far less around demographic intersections that may shape perspectives, information processing, or institutional exposure.
The most consistent finding concerns female non-executive directors and ESG performance. Across governance systems and regulatory settings (García-Sánchez et al., 2023; Konadu et al., 2022; Singhania et al., 2024), female non-executive representation is robustly associated with stronger ESG performance, with few studies reporting null results and no studies reporting consistently negative effects. Interpreted through stakeholder-agency theory, this pattern is consistent with a monitoring and stakeholder-alignment channel: where women occupy non-executive roles, they may strengthen oversight of managerial discretion, increase attention to stakeholder claims, and reduce stakeholder-agency conflicts that otherwise weaken sustainability performance (Hill and Jones, 1992).
Evidence is less stable when the outcome shifts from ESG performance to ESG reporting. A plausible interpretation is that executive roles provide greater decision authority and implementation capacity, but also embed directors more tightly in organisational incentives and constraints. These constraints vary by institutional setting, governance architecture, and the internal allocation of sustainability responsibilities, which may explain why positive associations cluster in some contexts while null or negative findings appear in others. In addition, several studies provide limited detail on which executive positions women hold and how directly these roles connect to ESG reporting processes, which likely contributes to mixed evidence on disclosure outcomes.
For other heterogeneity dimensions, education, expertise, interlocks/family affiliation, and tenure, the evidence base remains thin and operationalisations vary substantially. As a result, the appropriate conclusion is not that these attributes are irrelevant, but that the literature has not yet converged on sufficiently comparable measures and design to support strong generalisations. Taken together, the review suggests that BGD heterogeneity matters most consistently where it maps clearly onto monitoring credibility and authority (non-executive roles), whereas evidence remains underdeveloped for observable attributes and for underlying attributes that operate through more complex and potentially ambivalent resource and constraint mechanisms.
5.2 Research recommendations
To provide a clearer roadmap, we organise future research directions into methodological gaps, theoretical gaps, and contextual gaps.
A central limitation of much of the archival evidence is identification. Selection, reverse causality, and omitted-variable concerns remain salient, particularly because firms may appoint women with specific profiles in response to stakeholder pressure, legitimacy concerns, or anticipated regulatory change. Future studies should more systematically employ stronger designs, including quasi-natural experiments and reform-based settings that introduce plausibly exogenous variation in female representation and role allocation. Recent reform-based studies examining outcomes such as investment efficiency, innovation, corporate carbon emissions, and CSR provide an important complement to the ESG-focused evidence base and illustrate how stronger identification can broaden inference beyond disclosure to strategic outcomes (Baik et al., 2024; Barroso et al., 2024; Wang et al., 2025; Xin et al., 2025). A key opportunity is to integrate heterogeneity explicitly into these designs by testing whether reform effects differ by women's role (executive versus non-executive), independence, or other attributes, and whether such differences explain cross-country variation in estimated reform impacts.
In addition, future research should improve outcome comparability by specifying more precisely which constructs are captured by ESG reporting versus ESG performance. Given that many performance scores partly reflect disclosure, studies should be explicit about measurement overlap and conduct robustness tests across alternative data providers or proxy constructions where feasible (Berg et al., 2022).
The evidence indicates that heterogeneity attributes matter through distinct channels, monitoring credibility, decision authority, and access to ESG-relevant resources. Stakeholder-agency theory can be applied more sharply by deriving attribute-specific predictions. For example, non-executive status and independence should strengthen monitoring-based mechanisms, whereas sustainability-relevant expertise should affect ESG outcomes through problem recognition, evaluation of trade-offs, and scrutiny of disclosure quality. Social capital attributes (e.g. interlocks, family ties) are likely to generate ambivalent effects: they may facilitate information and legitimacy spillovers, while also increasing conflicts of interest or director busyness, implying non-linearities and interaction effects that remain under-tested.
A second theoretical priority concerns the meaning of “non-executive” versus “independent” across governance systems. Many studies treat independence as interchangeable with non-executive status, yet these constructs do not coincide across one-tier and two-tier systems and may be further complicated by codetermination and the presence of employee representatives (Scholz and Vitols, 2019). Future work should model governance architecture explicitly and theorise how role definitions condition the effectiveness of different female director profiles, rather than assuming equivalence across settings.
Observable heterogeneity attributes (especially age and nationality) remain underexamined despite their potential to proxy experience, institutional exposure, and information-processing differences. Future research should treat this as a priority gap and test whether observable intersections matter more in certain institutional environments, such as emerging markets, stakeholder-oriented governance systems, or settings with strong sustainability regulation. More broadly, studies should examine how heterogeneity interacts with organisational structures that allocate sustainability responsibility (e.g. sustainability committees, chief sustainability roles, internal control over non-financial reporting) and with policy environments (quota regimes, enforcement, disclosure mandates). These contextual moderators are likely to be central for explaining mixed evidence for executives and for ESG reporting outcomes.
Finally, the literature would benefit from more systematic evidence on thresholds and non-linearities. Rather than treating critical mass effects as separate from heterogeneity, future research could examine whether thresholds differ depending on which types of women (e.g. independent non-executives, executives with sustainability-relevant expertise) reach critical positions, and whether such combinations produce complementary or substitutive effects (Fleitas-Castillo et al., 2024; Meng and Zhu, 2024; Velte, 2025).
5.3 Practical and regulatory implications
Our findings suggest that boards and regulators should treat BGD as a multiple-attributed governance resource rather than a headcount metric. The most consistent evidence relates to women in non-executive roles: studies frequently associate female non-executive representation with stronger ESG performance across governance systems and regulatory settings, which is consistent with a monitoring-based channel emphasised in stakeholder-agency theory roles (García-Sánchez et al., 2023; Konadu et al., 2022; Singhania et al., 2024). From a practice perspective, this implies that firms aiming to strengthen sustainability outcomes should pay particular attention to whether female directors hold positions that enable credible oversight of sustainability-related decisions, risk management, and managerial discretion, rather than assuming that representation alone is sufficient.
At the same time, the literature cautions against equating non-executive status with independence. Several studies operationalise “independent” directors as a proxy for non-executives, yet these constructs do not always coincide across one-tier and two-tier systems and can be further complicated by board architecture and appointment practices. In practical terms, firms should be transparent about how independence is defined and enforced and should avoid overstating monitoring credibility when non-executive directors may be economically or socially embedded in ways that weaken independence.
For ESG reporting, practical implications should be framed more cautiously. While there is supportive evidence that female non-executives can be associated with stronger sustainability reporting quality in some settings, the review also indicates more mixed results for reporting than for performance, reflecting differences in disclosure proxies and limited comparability across studies (Al-Shaer and Zaman, 2016; Cabeza-García et al., 2018; Martínez et al., 2020). This pattern implies that improvements in reporting are unlikely to follow automatically from board composition changes without complementary organisation arrangements, such as clear responsibility allocation for sustainability disclosure, stronger internal controls over ESG information, and more consistent reporting practices.
From a regulatory standpoint, the evidence base supports policies that move beyond representation targets and enable assessment of who the appointed women directors are in terms of role and attributes. Quota regimes and disclosure requirements may be more effective when they increase women's presence in monitoring-relevant positions and when firms are required to report multidimensional board characteristics (Cabeza-García et al., 2018; Martínez et al., 2020). Regulators could therefore strengthen transparency by requiring more granular disclosure of board composition, including executive and non-executive status, independence classification (with clear definitional guidance), and selected human-capital indicators such as education and expertise that are repeatedly discussed in archival literature (Sánchez‐Hernández et al., 2025; Haque et al., 2024). Such requirements would not only improve market transparency but also enhance research comparability and the evaluability of policy interventions across countries and governance systems.
5.4 Limitations and outlook
This review synthesises a heterogeneous archival literature and therefore inherits limitations that should be interpreted as directions for improvement. First, the evidence base is uneven across heterogeneity dimensions: underlying attributes are heavily studied, whereas observable attributes remain thin. Second, reliance on vote-counting may overweight statistical significance and understate uncertainty in magnitude and precision, particularly where study designs and measures vary. Third, comparability is constrained by heterogeneity in ESG proxies and by the partial overlap between disclosure-based performance scores and reporting measures. Finally, the frequent use of independence as a proxy for non-executive status can be problematic in governance settings where these constructs diverge.
Looking ahead, expanding sustainability regulation, increasing disclosure requirements, and the growth of reform-based and quasi-natural experimental studies create favourable conditions for strengthening causal inference and clarifying mechanisms. Applying the heterogeneity framework to these designs can explain variation in estimated reform effects across countries and identify which female director profiles matter most for sustainability outcomes. A more mature literature will likely be one that (1) derives attribute-specific predictions grounded in theory, (2) maintains a strict separation between ESG performance and reporting, and (3) employs research designs and measurement strategies that improve identification and comparability across settings.
Note
The PRISMA 2020 Checklist can be found at: https://www.prisma-statement.org/.
The supplementary material for this article can be found online.

