This study determines the existence of regional heterogeneities in returns to education in Zambia. It seeks to analyse the differential impact of education on wages across different geographical regions of Zambia. Adopting a regional approach, the study employs novel empirical methods to understand the pecuniary benefits of education at the sub-national level.
The study analyses data from Zambia’s 2021 Labour Force Survey. It employs modified human capital models and quantile regression techniques to examine how returns to education differ across regions and the wage distribution.
Results show a distinctive heterogeneous pattern in which returns to education are higher in urban regions, followed by peri-urban, and lowest in the rural region. Results further show that returns to education increase with the level of education and are highest at the tertiary level. Quantile regression estimates reveal that returns to education vary across income levels within regions, suggesting a necessity for tailored policy interventions. Additionally, individuals’ health conditions display direct effects on their earnings potential.
This study adopts a modified human capital model to account for the health dimension, which has received less attention in past returns to education research. Furthermore, a regional approach is employed to capture within-country variations necessary for designing policy interventions aimed at addressing region-specific challenges.
The peer review history for this article is available at: https://publons.com/publon/10.1108/IJSE-03-2024-0262
1. Introduction
This study aims to analyse regional heterogeneities in returns to education in Zambia. The study is motivated by the fact that education is widely considered a fundamental driver of sustainable development, with governments advocating for it as an essential tool in combating economic and social inequality. Education improves individual well-being by boosting earning potential and providing access to prestigious careers (Gunderson and Oreopolous, 2020). While education is essential, its impact on individuals and various regions of the country is heterogeneous (Gao and Li, 2022). This heterogeneous effect of education on different regions of a country is largely neglected in existing literature (Mphuka and Simumba, 2012; Patrinos and Psacharopoulos, 2020; Blau et al., 2024), resulting in limited evidence regarding its heterogeneous effects across regions within a country.
The returns to education are expected to vary across regions due to differences in labour market opportunities and educational attainment (Zwane, 2020). For example, Zambia’s urban regions with more job opportunities and better access to education, are expected to have higher returns to education. Economic theories (Marshall, 1890; Arrow, 1962; Romer, 1986; Champion, 1986) suggest that differences in regional population growth and economic development drive wage increases in urban regions that attract more migrants. Despite these theories, there is limited empirical evidence on regional heterogeneities in returns to education.
Moreover, the effect of an individual’s health condition on their wage earnings has largely been overlooked in the literature on returns to human capital investment. By neglecting to incorporate individuals’ health conditions into the human capital model, previous studies (Patrinos and Psacharopoulos, 2020) have limited the primary determinants of wages to education and work experience alone. Consequently, this study highlights the significance of health conditions in influencing individuals’ labour income and productivity in the labour market. It incorporates individuals’ health status to analyse its impact on educational returns across regions of Zambia. The inclusion of health in Mincer’s (1974) definition of human capital is motivated by the updated World Bank’s (2021) Human Capital Index, which acknowledges health as an important component of human capital.
Furthermore, the existing literature (Blau et al., 2024) on gender wage gaps has mainly focused on identifying these gaps at the national level, with limited exploration at the sub-national level. Previous literature (Sanchez et al., 2022) has also largely focussed on examining gender wage gaps at the mean of the wage distribution, giving less attention to how these gaps vary across different income levels. To address these gaps, this study takes a novel approach by investigating the existence of gender income gaps at the sub-national level and different points of the regional wage distribution.
To achieve this study’s objectives, answers to the following questions are sought: 1) Do returns to education vary by region in Zambia? 2) Do regional returns to education differ across the wage distribution? 3) Do individuals in bad or worse health conditions receive lower wages in Zambia? 4) Are there gender differences in returns to education across Zambia’s regions? By answering these questions, the study provides new insights into the economic outcomes of education at sub-national levels.
The structure of the rest of this study is as follows: Section 2 discusses the theoretical model and reviews the empirical literature. Section 3 describes the data and variables, while Section 4 presents the empirical methods. Descriptive statistics are provided in Section 5, and the findings are discussed in Section 6.
2. Theoretical model and empirical literature review
2.1 Theoretical model
The human capital theory (Mincer, 1974) is paramount in understanding the link between human capital development and individual economic well-being. The theory posits that individuals invest in human capital development—such as education and work experience— because they anticipate higher future returns. Consequently, education, work experience, and other individual-level attributes are frequently used as predictors of workers’ earnings However, this classical approach tends to overlook the influence of regional or locational factors on wage heterogeneities. Variations in regional endowments and sorting processes among individuals and firms can lead to regional wage heterogeneities, a phenomenon expounded by Krugman’s (1991) concept of agglomeration economies.
The principle of agglomeration economies was first credited to Marshall (1890), and later revised by Arrow (1962) and Romer (1986). This theory, known as MAR, argues that regional disparities in economic outcomes may arise due to local knowledge spillovers among individuals and firms, as well as variations in regional labour markets. Despite its theoretical foundations, the MAR theory has not been fully tested, especially in developing countries. Therefore, this study integrates the human capital theory with the MAR theory to understand regional heterogeneities in returns to education.
Furthermore, the traditional human capital theory assumes that only education and work experience contribute to the stock of human capital, specifically:
where, HCEi is the stock of human capital endowment and EDi and EXi are the education and work experience of individual i, respectively. As such, the human capital stock in Equation (1) neglects health, which is one of the key components of the human capital stock as posited by the World Bank (2021). This assertion is supported by the discrimination theory (Johnson and Lambrinos, 1985), which argues that lower wages for unhealthy workers are the result of employers’ discrimination against them, regardless of their productivity. Johnson and Lambrinos (1985) further contend that unhealthy workers may be willing to accept lower wages in exchange for other work-related benefits, such as flexibile work arrangements.
Despite both empirical and theoretical evidence suggesting that a worker’s health condition significantly influences their income, previous studies have not fully accounted for health. In this context, this study extends the definition of the human capital theory by including health as a key determinant of wages. Thus, the redefined human capital model takes the following form:
where HCi is the health condition of the individual.
In addition, the statistical discrimination theory (Phelps, 1972) suggests that women are underpaid because employers perceive them as less productive. This perception supports the glass ceiling hypothesis, which refers to the invisible barriers that prevent qualified women from advancing to higher-paying positions (Ozturk and Simsek, 2019). Recently, some studies (Martin-Roman et al., 2024) have attempted to test this hypothesis in developed countries; however, its application in African countries remains limited.
2.2 Empirical literature review
In Zambia, there are no recent studies on this subject. However, Mphuka and Simumba (2012) estimated returns to education using Mincerian ordinary least squares (OLS) methods and found a mean earnings premium of 15.1% per additional year of education. Despite the authors presenting national averages and assuming uniform returns across the country, their study provides evidence of positive returns from educational investment in the Zambian labour market.
At the global level, regional heterogeneities in returns to education have been relatively underexplored, with a notable gap in research within the African context. Mafie et al. (2021) analyses differential returns to education between rural and urban regions of Tanzania and finds higher returns in urban regions due to higher educational attainment. However, their use of OLS regression and Blinder–Oaxaca decomposition captures only regional averages and overlooks intra-regional variations. The authors also did not address the problem of endogeneity—where unobserved factors, such as inherent ability or motivation, may influence both educational attainment and wage outcomes— which if not addressed can lead to biased and inconsistent estimates (Kroupova et al., 2024).
Zwane (2020) examined returns to education in South Africa and found higher returns in urban regions. The author attributed the findings to lower education attainment in rural regions due to inadequate educational infrastructure. However, Zwane (2020) relied on Mincerian OLS methods, providing average estimates and assuming homogeneity in individual ability, but not accounting for the endogeneity of education and regional variation including peri-urban regions. Similarly, Gao and Li (2022) found that urbanised regions in China with developed economies attracted skilled workers, leading to higher educational returns. However, the analysis did not address endogeneity and did not capture intra-regional variations.
Given the existing gaps in the theoretical and empirical literature, this study tested the following null hypotheses:
Returns to education in Zambia are higher in urban regions,
Returns to education are higher for higher educational levels across all regions,
Individuals in bad or worse health receive lower wages across regions of Zambia,
Women in Zambia receive lower wages across regions and the wage distribution, and
The regional gender wage gaps are wider at the top of the wage distribution against women.
This study makes substantial contributions to the literature. Firstly, it adopts a regional approach, departing from the conventional methods of analysing average national aggregate returns to education. Secondly, the study incorporates the health dimension into the human capital model to provide insights into how health affects labour income at both individual and regional levels. Thirdly, the study provides empirical evidence of regional gender wage gaps. Lastly, it tests the glass ceiling hypothesis to determine if the regional gender wage gaps against women are wider at the top of the wage distribution.
3. Data, variables and classification of regions
3.1 Data
The study analysed data from the 2021 Labour Force Survey (LFS) provided by the Zambia Statistics Agency. The dataset included individual-level information, such as the region of residence and work. The analyses focussed on individuals aged 15 years and above, in line with the minimum contractual age for work outlined in Zambia’s Employment Code Act of 2019 (Ministry of Labour and Social Security, 2019). Additionally, the statistical analyses were limited to observations with no missing data on variables of interest (Cameron and Trivedi, 2022). The final number of individuals included in the dataset was 3,387.
3.2 Variables
This study used an individual’s gross monthly wage, which was log-transformed, as the dependent variable. Key explanatory variables included education, age, work experience, health condition, union membership, marital status, monthly hours worked, and childcare responsibilities. Education was categorised into “primary or none” (reference), “secondary education”, and “tertiary education”. Work experience was measured as years of employment subtracted from the survey year (2021), and included squared work experience to measure diminishing returns. Health condition was classified as “good health” (reference), “bad health” (two or three instances of illness), and “worse health” (more frequent illnesses). Other explanatory variables included a dummy for the Head of the Household, representing individuals who headed a household of at least two members.
3.3 Classification of regions
Zambia is categorised into three main regions based on the Zambia Statistics Agency’s (2023) classification, which considers poverty rates, infrastructure deprivation, education access, and health accessibility. Thus, the three regions are urban, peri-urban and rural. Urban regions include Lusaka, Copperbelt, and Southern Provinces. These regions are characterised by high population density, developed labour markets, and significant industrial and economic activities. Peri-urban regions include Central, Eastern, and North-Western Provinces. These regions share similarities with urban regions but to a lesser extent. Rural regions consist of Luapula, Muchinga, Northern, and Western Provinces and are characterised by limited infrastructure, health services, education access, and higher poverty rates.
4. Empirical methods
The study estimated average returns to education across regions using a modified Mincer (1974) OLS human capital model, incorporating health variables and their interactions with education. To account for individual heterogeneity and to capture the variation in returns to education across the wage distribution, the study employed a quantile regression model (Koenker, 2017). This approach allows for a more comprehensive understanding of how education impacts wages at different points in the wage distribution, rather than just the average effect.
To address the potential endogeneity of education, the study applied a quantile regression instrumental variable two-stage least squares (QR-IV 2SLS) approach. Endogeneity can arise when there are omitted variables, measurement errors, or reverse causality, leading to biased and inconsistent estimates in OLS models. By using instrumental variables in the QR-IV 2SLS approach, the study corrects for endogeneity, ensuring that the estimated effects of education on wages are more reliable and not distorted by unobserved confounders (Tran, 2023). These methods were adopted because of their ability to capture the heterogeneous effects of education on wages while addressing the endogeneity problem.
Therefore, the modified Mincerian OLS human capital model takes the form:
where Log (Yij) is the natural logarithm of earnings for individual i in region j. The term ELij is the dummy variable for the attained education level of individual i in region j. The study utilised dummy variables for attained education to estimate and compare returns across different education levels. This approach avoids the assumption inherent in using years of schooling, which presumes uniform returns per additional year of education (Raudenska and Mysikova, 2023). The term Xij is the vector of independent regressors apart from attained education and health condition (HCij).
The term represents other individual-level attributes like age, gender, and others, alongside their intercepts ().
To assess regional returns to education across the wage distribution, the study implemented the following modified quantile regression model:
where , and are intercepts for education levels, health conditions and the interaction term, respectively, at the θth quantile. Xij is the vector of independent regressors apart from attained education ELij and health condition HCij.
The term Quantθ[Log (Yij) | ELij, HCij, Xij] denotes the θth conditional quantile of Log (Yij) given ELij, HCij and Xij, with 0 < θ < 1. The entire distribution of the dependent variable, conditional on the covariates, is traced as θ is varied from 0 to 1, (Koenker, 2017).
The error term ( in Equations (3) and (4) is assumed to be unbiased with a zero mean, meaning that its expected value is zero for all observations (E[] = 0). The error term also assumes homoscedasticity, meaning that it has a constant variance (σ2) across all observations. Additionally, the error terms for different observations are assumed to be independent, with no autocorrelation (Cov[] = 0 for i≠k).
Endogeneity arises because education is likely correlated with the error term due to factors such as innate ability, motivation, and family background, which directly impact both education and wages. Failure to account for this correlation can lead to biased and inconsistent parameter estimates. To mitigate endogeneity, a control function instrumental variable method was applied, using location as an instrument for education attainment.
Location was chosen as an instrument for this study because individuals with better access to educational institutions are more likely to complete higher levels of education. The binary variable used for location assigns a value of 1 to individuals in urban areas and 0 to those in rural areas. Proximity to educational resources, typically associated with urban areas, affects educational attainment and provides exogenous variation to correct for endogeneity (Sengupta and Puri, 2022).
The study implemented a QR-IV 2SLS approach using the control function approach. The first-stage probit model took the form:
where ELij represents the binary education outcome for individual i. In Equation (5), education is treated as a dependent variable. The location instrumental variable (Zi) and exogenous covariates ( are predictors in this probit model. The predicted values of education from Equation (5) are used in the second-stage quantile regression model to correct for endogeneity in Equation (4).
Dummy variables representing different health conditions (bad and worse) are included in the model to capture their impact on earnings. The interaction terms between education and health () assess how health conditions affect the returns to education. It is expected that health conditions would reduce the positive impact of education on income.
Finally, Equations (3) and (4) were estimated separately for each region to account for regional variations in economic and labour market structures, thereby minimising aggregation bias (Dunyo et al., 2024).
4.1 Heckman (1979) sample selection model
The study tested for potential sample selection problems using the Heckman (1979) sample selection model. Given that the Zambia Statistics Agency (2021) employed a uniform sampling method across regions, applying the Heckman model to the entire sample was considered appropriate. Sample selection bias occurs when non-random samples are utilised, potentially leading to biased and inconsistent estimations in wage function analysis (Blau et al., 2024).
Heckman’s two-stage empirical model consists of a participation equation and an outcome equation, designed to detect and correct for selection bias. The participation equation estimates the likelihood that individuals participate in paid work, taking into account factors such as education level, health condition, age, and marital status—key variables that significantly influence labour market participation in Zambia. This stage helps account for the fact that not all individuals are part of the labour force, and those who are may differ systematically from those who are not.
The second stage, the outcome equation, focuses on individuals who are employed in paid work, estimating their earnings. By restricting the analysis to those who are employed, this stage aims to examine the wage determinants among participants in the labour market. Without correcting for selection bias, estimates from this stage might be skewed, as they would only reflect the wages of those already working, potentially ignoring the characteristics of individuals not engaged in paid labour. To detect and correct for selection bias, the study computed the inverse Mills ratio (IMR), termed lambda (λ), from the participation equation’s parameter estimates. The IMR is then included as an additional regressor in the outcome equation. If the coefficient of lambda is statistically significant, it indicates that sample selection bias is present, implying that the characteristics influencing labour market participation also affect wage outcomes. This correction ensures that the wage equation provides unbiased and consistent estimates by accounting for any non-random selection into the labour market.
5. Descriptive statistics and normality tests
Descriptive analyses and normality tests were conducted to examine the distributional properties of the data. A skewness and kurtosis test on the log of wage and age variables revealed probability values of zero for both, indicating that these variables follow a normal distribution (Demir, 2022).
The descriptive statistics presented in Table 1 are based on a sample of 3,387 observations, with 69% male (2,343) and 31% female (1,044). The sample is distributed across urban (62.1%), peri-urban (22%), and rural (15.9%) regions. The descriptive statistics for wages in Zambian Kwacha (ZMW show that the average wages in urban regions are ZMW 3063.6, and ZMW 3240.9 in the peri-urban regions. Rural regions have the lowest average wages at ZMW 2203.8 per month.
Descriptive statistics
| Variable | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|
| Descriptive statistics for continuous variables | ||||||
| Mean | SD | Mean | SD | Mean | SD | |
| Wages | 3063.6 | 4570.4 | 3240.9 | 4098.8 | 2203.8 | 3,956 |
| Log wage | 7.3 | 1.4 | 7.2 | 1.5 | 5.9 | 2.6 |
| Age | 35.6 | 10.8 | 34.5 | 10.7 | 33.7 | 11.7 |
| Hours | 189.1 | 61.1 | 177.3 | 78.6 | 171.6 | 77.2 |
| Experience | 5.9 | 7 | 5.9 | 6.5 | 5.2 | 7.4 |
| Average monthly wages by education level | ||||||
| Primary | 967.8 | 884.2 | 1706.6 | 3603.5 | 620.9 | 1392.3 |
| Secondary | 2414.3 | 3539.1 | 2298.3 | 3050.3 | 1802.9 | 2491.8 |
| Tertiary | 7061.9 | 6643.9 | 6784.6 | 4477.6 | 5674.5 | 6446.7 |
| Variable | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|
| Descriptive statistics for continuous variables | ||||||
| Mean | SD | Mean | SD | Mean | SD | |
| Wages | 3063.6 | 4570.4 | 3240.9 | 4098.8 | 2203.8 | 3,956 |
| Log wage | 7.3 | 1.4 | 7.2 | 1.5 | 5.9 | 2.6 |
| Age | 35.6 | 10.8 | 34.5 | 10.7 | 33.7 | 11.7 |
| Hours | 189.1 | 61.1 | 177.3 | 78.6 | 171.6 | 77.2 |
| Experience | 5.9 | 7 | 5.9 | 6.5 | 5.2 | 7.4 |
| Average monthly wages by education level | ||||||
| Primary | 967.8 | 884.2 | 1706.6 | 3603.5 | 620.9 | 1392.3 |
| Secondary | 2414.3 | 3539.1 | 2298.3 | 3050.3 | 1802.9 | 2491.8 |
| Tertiary | 7061.9 | 6643.9 | 6784.6 | 4477.6 | 5674.5 | 6446.7 |
| Descriptive statistics for categorical variables | ||||||
|---|---|---|---|---|---|---|
| Variable | No. | % | No. | % | No. | % |
| Married | 1,261 | 60 | 486 | 65.2 | 326 | 60.5 |
| Union membership | 186 | 26.4 | 186 | 26.5 | 138 | 28 |
| Head of house | 1,422 | 67.6 | 528 | 70.9 | 341 | 63.3 |
| Child care | 408 | 19.5 | 288 | 40.1 | 223 | 44.3 |
| Bad health | 134 | 6.4 | 80 | 11.1 | 121 | 24 |
| Worse health | 86 | 4.1 | 68 | 9.5 | 58 | 11.5 |
| Level of education | ||||||
| Primary | 343 | 16.7 | 173 | 23.7 | 161 | 31.6 |
| Secondary | 1,290 | 63 | 372 | 51 | 234 | 46 |
| Tertiary | 419 | 20.3 | 184 | 25.3 | 115 | 22.4 |
| Gender distribution | ||||||
| Women | 653 | 31.1 | 217 | 29.1 | 174 | 32.3 |
| Men | 1,450 | 69 | 528 | 70.9 | 365 | 67.7 |
| Descriptive statistics for categorical variables | ||||||
|---|---|---|---|---|---|---|
| Variable | No. | % | No. | % | No. | % |
| Married | 1,261 | 60 | 486 | 65.2 | 326 | 60.5 |
| Union membership | 186 | 26.4 | 186 | 26.5 | 138 | 28 |
| Head of house | 1,422 | 67.6 | 528 | 70.9 | 341 | 63.3 |
| Child care | 408 | 19.5 | 288 | 40.1 | 223 | 44.3 |
| Bad health | 134 | 6.4 | 80 | 11.1 | 121 | 24 |
| Worse health | 86 | 4.1 | 68 | 9.5 | 58 | 11.5 |
| Level of education | ||||||
| Primary | 343 | 16.7 | 173 | 23.7 | 161 | 31.6 |
| Secondary | 1,290 | 63 | 372 | 51 | 234 | 46 |
| Tertiary | 419 | 20.3 | 184 | 25.3 | 115 | 22.4 |
| Gender distribution | ||||||
| Women | 653 | 31.1 | 217 | 29.1 | 174 | 32.3 |
| Men | 1,450 | 69 | 528 | 70.9 | 365 | 67.7 |
Source(s): Authors’ own work
An analysis of wages by education level reveals higher average wages for individuals with tertiary education across regions, followed by secondary, and the lowest for primary education. Wages are consistently higher across all education levels in urban regions, suggesting more competitive labour markets.
Educational attainment also varies by region: 63% of individuals in urban regions had completed secondary education, compared to 51% in peri-urban and 46% in rural regions. Similarly, 25.3% in peri-urban regions have tertiary education, compared to 22.4% in rural regions. These disparities are consistent with literature highlighting greater educational inequality between urban and rural regions (Van Maarseveen, 2021). Health disparities are also observed, with worse health conditions more prevalent in rural regions due to limited healthcare access compared to urban regions.
6. Results
The findings from the Heckman (1979) sample selection test reveal that the coefficient of the Inverse Mills Ratio (Lambda) was not statistically significant (p-value of 0.23), indicating no apparent sample selection bias in the data. As such, no sample selection bias correction procedures are undertaken. A sample selection problem would be indicated by a statistically significant coefficient Lambda coefficient (Garlick and Hyman, 2022).
6.1 Regional heterogeneities in returns to education in Zambia
Results presented in Table 2 show that returns to education vary by region, with the highest in urban regions, followed by peri-urban, and lowest in rural regions, where education had an insignificant effect on wages. In urban regions, secondary education yields a 59.6% return, with tertiary education yielding 159.8%, compared to individuals with primary or no education. These findings align with Groot et al. (2014), who found that individuals in urban regions, due to access to better job information, secure higher-paying jobs and are more productive.
Modified human capital model-regional returns to education in Zambia - (dependent variable = log of wage)
| Variable | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | |
| Secondary | 0.596*** | 0.083 | 0.519*** | 0.119 | 0.422 | 0.357 |
| Tertiary | 1.598*** | 0.105 | 1.379*** | 0.159 | −0.454 | 0.478 |
| Age | 0.007* | 0.003 | 0.025*** | 0.005 | 0.0169 | 0.014 |
| Experience | 0.072*** | 0.01 | 0.024 | 0.02 | −0.011 | 0.044 |
| Gender dummy | −0.042 | 0.069 | 0.274* | 0.11 | 0.317 | 0.287 |
| Bad health | −0.05563 | 0.251 | −0.215 | 0.239 | −0.295 | 0.518 |
| Worse health | 0.152 | 0.485 | 0.205 | 0.457 | −0.5931 | 0.571 |
| Secondary*Bad health | 0.223 | 0.295 | 0.059 | 0.311 | 0.124 | 0.656 |
| Secondary*Worse health | −0.011 | 0.515 | 0.095 | 0.493 | 0.074 | 0.795 |
| Tertiary*Bad health | 0.202 | 0.326 | 0.382 | 0.357 | 2.009** | 0.725 |
| Tertiary*Worse health | −1.103* | 0.556 | −0.191 | 0.539 | 3.132* | 1.256 |
| Married | −0.07 | 0.055 | −0.232* | 0.093 | −0.179 | 0.238 |
| Hours | 0.001** | 0.001 | 0.005*** | 0.001 | 0.005*** | 0.002 |
| Child care | 0.088 | 0.071 | −0.036 | 0.088 | 0.379 | 0.263 |
| Union | 0.643*** | 0.073 | 1.251*** | 0.118 | 1.438*** | 0.353 |
| Head of house | 0.229** | 0.071 | 0.416*** | 0.121 | 0.413 | 0.293 |
| Experience2 | −0.001*** | 0.0003 | −0.001 | 0.001 | 0.001 | 0.001 |
| Constant | 5.539*** | 0.158 | 4.417*** | 0.216 | 3.367*** | 0.566 |
| Variable | Urban regions | Peri-urban regions (n = 745) | Rural regions | |||
|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | |
| Secondary | 0.596*** | 0.083 | 0.519*** | 0.119 | 0.422 | 0.357 |
| Tertiary | 1.598*** | 0.105 | 1.379*** | 0.159 | −0.454 | 0.478 |
| Age | 0.007* | 0.003 | 0.025*** | 0.005 | 0.0169 | 0.014 |
| Experience | 0.072*** | 0.01 | 0.024 | 0.02 | −0.011 | 0.044 |
| Gender dummy | −0.042 | 0.069 | 0.274* | 0.11 | 0.317 | 0.287 |
| Bad health | −0.05563 | 0.251 | −0.215 | 0.239 | −0.295 | 0.518 |
| Worse health | 0.152 | 0.485 | 0.205 | 0.457 | −0.5931 | 0.571 |
| Secondary*Bad health | 0.223 | 0.295 | 0.059 | 0.311 | 0.124 | 0.656 |
| Secondary*Worse health | −0.011 | 0.515 | 0.095 | 0.493 | 0.074 | 0.795 |
| Tertiary*Bad health | 0.202 | 0.326 | 0.382 | 0.357 | 2.009** | 0.725 |
| Tertiary*Worse health | −1.103* | 0.556 | −0.191 | 0.539 | 3.132* | 1.256 |
| Married | −0.07 | 0.055 | −0.232* | 0.093 | −0.179 | 0.238 |
| Hours | 0.001** | 0.001 | 0.005*** | 0.001 | 0.005*** | 0.002 |
| Child care | 0.088 | 0.071 | −0.036 | 0.088 | 0.379 | 0.263 |
| Union | 0.643*** | 0.073 | 1.251*** | 0.118 | 1.438*** | 0.353 |
| Head of house | 0.229** | 0.071 | 0.416*** | 0.121 | 0.413 | 0.293 |
| Experience2 | −0.001*** | 0.0003 | −0.001 | 0.001 | 0.001 | 0.001 |
| Constant | 5.539*** | 0.158 | 4.417*** | 0.216 | 3.367*** | 0.566 |
Note(s): *p < 0.05; **p < 0.01; ***p < 0.001
Source(s): Authors’ own work
In peri-urban regions, individuals with secondary education earn 51.9% more, while those with tertiary education earn 137.9% more than those with primary or no education. In rural regions, education had an insignificant effect on wages, implying that education is not a major determinant of income.
The study further reveals that individuals in urban regions experience 7.7% higher returns to secondary education compared to those in peri-urban regions, with this difference rising to 21.9% for tertiary education (Kamdjou, 2023). These comparisons are not applicable to rural regions, where returns to both levels of education are insignificant.
6.2 Health conditions and returns to education
Incorporating health conditions into the modified Mincer model reveals that poor health significantly reduces earnings. Interaction terms between education and health conditions highlight a substantial reduction in returns to education for individuals in poor health. For example, in urban regions, returns to tertiary education drop from 159.8% to 49.5% for individuals in worse health. This finding emphasises the impact of health on earning potential.
6.3 Quantile regression analysis results
Quantile regression results in Table 3 reveal that returns to education vary not only across regions but also across the wage distribution. In urban regions, returns to secondary education rise from 37.2% at the 25th percentile to 85.4% at the 90th percentile. In peri-urban regions, returns increase from 44.6% at the 25th percentile to 74.5% at the 75th, but then decline to 68.6% at the 90th percentile, reflecting the less competitive labour markets in peri-urban regions.
Modified quantile regression model-regional returns to education at different points of the wage distribution
| Variables | Quantile | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | ||
| Secondary | q25 | 0.372*** | 0.088 | 0.446* | 0.197 | 0.716 | 0.399 |
| q50 | 0.483*** | 0.067 | 0.624*** | 0.131 | 0.606* | 0.267 | |
| q75 | 0.657*** | 0.059 | 0.745*** | 0.127 | 0.98*** | 0.233 | |
| q90 | 0.854*** | 0.084 | 0.686** | 0.245 | 0.766* | 0.344 | |
| Tertiary | q25 | 1.37*** | 0.101 | 1.234*** | 0.321 | −4.689 | 2.535 |
| q50 | 1.399*** | 0.106 | 1.425*** | 0.161 | 0.848* | 0.392 | |
| q75 | 1.584*** | 0.086 | 1.483*** | 0.153 | 1.468*** | 0.352 | |
| q90 | 1.689*** | 0.103 | 1.242*** | 0.227 | 1.653*** | 0.448 | |
| Age | q25 | 0.007 | 0.004 | 0.024*** | 0.006 | 0.016 | 0.015 |
| q50 | 0.008* | 0.003 | 0.022*** | 0.004 | 0.022 | 0.012 | |
| q75 | 0.009** | 0.003 | 0.026*** | 0.008 | 0.014 | 0.009 | |
| q90 | 0.018*** | 0.005 | 0.037*** | 0.008 | 0.022 | 0.012 | |
| Experience | q25 | 0.054*** | 0.0116 | 0.011 | 0.032 | −0.042 | 0.045 |
| q50 | 0.057*** | 0.011 | 0.034 | 0.024 | 0.009 | 0.034 | |
| q75 | 0.068*** | 0.009 | 0.009 | 0.024 | −0.022 | 0.029 | |
| q90 | 0.071*** | 0.015 | 0.011 | 0.03 | 0.038 | 0.037 | |
| Gender dummy | q25 | −0.001 | 0.053 | 0.124 | 0.131 | 0.092 | 0.289 |
| q50 | 0.161** | 0.057 | 0.371*** | 0.099 | 0.2 | 0.201 | |
| q75 | 0.155** | 0.057 | 0.569*** | 0.115 | 0.098 | 0.204 | |
| q90 | 0.2328** | 0.081 | 0.631*** | 0.161 | 0.116 | 0.165 | |
| Variables | Quantile | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | ||
| Secondary | q25 | 0.372*** | 0.088 | 0.446* | 0.197 | 0.716 | 0.399 |
| q50 | 0.483*** | 0.067 | 0.624*** | 0.131 | 0.606* | 0.267 | |
| q75 | 0.657*** | 0.059 | 0.745*** | 0.127 | 0.98*** | 0.233 | |
| q90 | 0.854*** | 0.084 | 0.686** | 0.245 | 0.766* | 0.344 | |
| Tertiary | q25 | 1.37*** | 0.101 | 1.234*** | 0.321 | −4.689 | 2.535 |
| q50 | 1.399*** | 0.106 | 1.425*** | 0.161 | 0.848* | 0.392 | |
| q75 | 1.584*** | 0.086 | 1.483*** | 0.153 | 1.468*** | 0.352 | |
| q90 | 1.689*** | 0.103 | 1.242*** | 0.227 | 1.653*** | 0.448 | |
| Age | q25 | 0.007 | 0.004 | 0.024*** | 0.006 | 0.016 | 0.015 |
| q50 | 0.008* | 0.003 | 0.022*** | 0.004 | 0.022 | 0.012 | |
| q75 | 0.009** | 0.003 | 0.026*** | 0.008 | 0.014 | 0.009 | |
| q90 | 0.018*** | 0.005 | 0.037*** | 0.008 | 0.022 | 0.012 | |
| Experience | q25 | 0.054*** | 0.0116 | 0.011 | 0.032 | −0.042 | 0.045 |
| q50 | 0.057*** | 0.011 | 0.034 | 0.024 | 0.009 | 0.034 | |
| q75 | 0.068*** | 0.009 | 0.009 | 0.024 | −0.022 | 0.029 | |
| q90 | 0.071*** | 0.015 | 0.011 | 0.03 | 0.038 | 0.037 | |
| Gender dummy | q25 | −0.001 | 0.053 | 0.124 | 0.131 | 0.092 | 0.289 |
| q50 | 0.161** | 0.057 | 0.371*** | 0.099 | 0.2 | 0.201 | |
| q75 | 0.155** | 0.057 | 0.569*** | 0.115 | 0.098 | 0.204 | |
| q90 | 0.2328** | 0.081 | 0.631*** | 0.161 | 0.116 | 0.165 | |
Source(s): Authors’ own work
In rural regions, despite the overall insignificant returns in the OLS model, quantile regression reveals significant returns at higher percentiles. For instance, secondary education returns are 60.6 and 98% at the 50th and 75th percentiles, respectively, and tertiary education returns are 146.8 and 165.3% at the 75th and 90th percentiles.
Similarly, returns to tertiary education show an upward trend across income levels in the peri-urban regions, starting at 137% at the 25th percentile and reaching 168.9% at the highest percentile. In contrast, peri-urban regions exhibit lower returns to tertiary education, ranging between 123.4 and 148.3%. The lower returns to tertiary education in this region are attributed to the homogeneous and less competitive labour markets, which limit wage growth for highly educated individuals.
These results reveal that despite having the same qualifications, individuals do not receive uniform wage premiums for their education across regions. The results show that the heterogeneous effect of education across income levels is larger for tertiary education and modest for secondary education. These findings are consistent with those of Issofou and Aicha (2021) who concluded that returns to education increase as a worker moves towards the upper tail of the wage distribution.
Quantile regression also uncovers significant gender wage disparities, especially in peri-urban regions. At the 50th and 75th percentiles, respectively, men earn 37.1 and 56.9% more than women, with the gap widening to 63.1% at the 90th percentile. This pattern, consistent with the glass ceiling effect, highlights persistent gender inequalities, particularly at higher wage levels (Blau et al., 2024; Lee and Choi, 2024).
Both OLS and quantile regression analyses show that returns to education are highest in urban regions, followed by peri-urban, and lowest in rural regions. These regional differences are attributed to a lack of social amenities, limited access to education, and the presence of smaller and homogeneous labour markets in rural and peri-urban regions, (Backman, 2014; Van Maarseveen, 2021). Moreover, the concentration of industries in urban regions and diverse labour markets attract skilled individuals, further explaining the higher returns to education in these regions (Van Maarseveen, 2021).
Moreover, lower returns to education in peri-urban and rural regions are associated with lower levels of educational attainment. Educational institutions are concentrated in Zambia’s urban regions, while rural regions have limited access to schools and tertiary institutions. The distance to schools in rural regions exacerbates school absenteeism, resulting in low enrolment and high dropout rates (Kabanga and Mulauzi, 2020). Additionally, malnutrition and poor health in rural regions negatively impact academic performance, with frequent illness and limited healthcare access contributing to absenteeism, further reducing educational attainment and returns to education (World Bank, 2021).
6.4 Quantile regression instrumental variable two-stage-least squares results
This section discusses the findings from the Quantile Regression Instrumental Variable Two-Stage Least Squares (QR-IV 2SLS) analysis, which addresses the endogeneity of education. The Anderson-Rubin Wald test was conducted and confirmed the validity and strength of the chosen instrument, showing a strong correlation between the instrument and education levels (F-Statistic = 11.45), exceeding the threshold of 10 for a valid and strong instrument (Sirisankanan, 2023).
The QR-IV 2SLS results in Table 4 confirm that the impact of education on income varies across Zambia’s regions, with secondary education having a stronger effect on labour income than tertiary education. This strong effect is expected, as failing to account for the endogeneity of education tends to underestimate its true impact and is consistent with what Bhatti et al. (2013) found when they compared findings from their OLS model and those from instrumental variable approach. For instance, in urban regions, returns to secondary education rise sharply from 142.9% at the median to 358.5% at the 90th percentile, far exceeding the estimates from models that did not control for endogeneity.
Results from the modified quantile regression instrumental variable two-stage-least squares
| Variables | Quantile | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | ||
| Secondary | q25 | 0.955 | 1.012 | 0.47 | 1.12 | 3.59 | 2.394 |
| q50 | 1.429* | 0.63 | 0.964 | 0.677 | 2.836 | 1.637 | |
| q75 | 2.841*** | 0.645 | 1.82* | 0.858 | 2.369 | 2.212 | |
| q90 | 3.585*** | 0.967 | 0.847 | 1.321 | 3.048 | 2.743 | |
| Tertiary | q25 | 1.557 | 0.971 | 1.436 | 1.718 | 6.518 | 3.439 |
| q50 | 0.485 | 0.861 | 0.468 | 1.175 | 4.946* | 2.022 | |
| q75 | −1.39* | 0.681 | −1.89 | 1.57 | 3.631 | 2.132 | |
| q90 | −2.675* | 1.26 | −1.139 | 2.588 | −1.071 | 3.098 | |
| Age | q25 | 0.006 | 0.007 | 0.018 | 0.009 | −0.007 | 0.02 |
| q50 | 0.015** | 0.005 | 0.029*** | 0.006 | 0.01 | 0.014 | |
| q75 | 0.028*** | 0.005 | 0.036*** | 0.006 | 0.02 | 0.016 | |
| q90 | 0.044*** | 0.006 | 0.033** | 0.011 | 0.034 | 0.023 | |
| Experience | q25 | 0.052 | 0.027 | 0.026 | 0.041 | −0.082 | 0.072 |
| q50 | 0.07*** | 0.016 | 0.038 | 0.035 | −0.047 | 0.047 | |
| q75 | 0.121*** | 0.017 | 0.106*** | 0.031 | −0.039 | 0.055 | |
| q90 | 0.16*** | 0.03 | 0.07 | 0.053 | 0.067 | 0.083 | |
| Gender dummy | q25 | −0.084 | 0.166 | 0.253 | 0.273 | 0.55 | 0.537 |
| q50 | −0.044 | 0.132 | 0.198 | 0.18 | 0.462 | 0.313 | |
| q75 | −0.227 | 0.127 | −0.144 | 0.245 | 0.273 | 0.362 | |
| q90 | −0.459* | 0.178 | 0.161 | 0.467 | −0.639 | 0.509 | |
| Variables | Quantile | Urban regions (n = 2,103) | Peri-urban regions (n = 745) | Rural regions (n = 539) | |||
|---|---|---|---|---|---|---|---|
| Coef | SD | Coef | SD | Coef | SD | ||
| Secondary | q25 | 0.955 | 1.012 | 0.47 | 1.12 | 3.59 | 2.394 |
| q50 | 1.429* | 0.63 | 0.964 | 0.677 | 2.836 | 1.637 | |
| q75 | 2.841*** | 0.645 | 1.82* | 0.858 | 2.369 | 2.212 | |
| q90 | 3.585*** | 0.967 | 0.847 | 1.321 | 3.048 | 2.743 | |
| Tertiary | q25 | 1.557 | 0.971 | 1.436 | 1.718 | 6.518 | 3.439 |
| q50 | 0.485 | 0.861 | 0.468 | 1.175 | 4.946* | 2.022 | |
| q75 | −1.39* | 0.681 | −1.89 | 1.57 | 3.631 | 2.132 | |
| q90 | −2.675* | 1.26 | −1.139 | 2.588 | −1.071 | 3.098 | |
| Age | q25 | 0.006 | 0.007 | 0.018 | 0.009 | −0.007 | 0.02 |
| q50 | 0.015** | 0.005 | 0.029*** | 0.006 | 0.01 | 0.014 | |
| q75 | 0.028*** | 0.005 | 0.036*** | 0.006 | 0.02 | 0.016 | |
| q90 | 0.044*** | 0.006 | 0.033** | 0.011 | 0.034 | 0.023 | |
| Experience | q25 | 0.052 | 0.027 | 0.026 | 0.041 | −0.082 | 0.072 |
| q50 | 0.07*** | 0.016 | 0.038 | 0.035 | −0.047 | 0.047 | |
| q75 | 0.121*** | 0.017 | 0.106*** | 0.031 | −0.039 | 0.055 | |
| q90 | 0.16*** | 0.03 | 0.07 | 0.053 | 0.067 | 0.083 | |
| Gender dummy | q25 | −0.084 | 0.166 | 0.253 | 0.273 | 0.55 | 0.537 |
| q50 | −0.044 | 0.132 | 0.198 | 0.18 | 0.462 | 0.313 | |
| q75 | −0.227 | 0.127 | −0.144 | 0.245 | 0.273 | 0.362 | |
| q90 | −0.459* | 0.178 | 0.161 | 0.467 | −0.639 | 0.509 | |
Source(s): Authors’ own work
In contrast, in peri-urban and rural regions, secondary and tertiary education are not significant determinants of income, as indicated by the statistically insignificant coefficients across quantiles. This is consistent with the economic structures of these regions, which are dominated by low-skilled, labour-intensive sectors like agriculture and mining. These sectors offer limited opportunities for skilled workers to leverage their education, resulting in lower returns for individuals in higher wage percentiles compared to urban regions, where skill-intensive sectors such as finance, technology, and healthcare drive higher returns.
An unexpected finding from the QR-IV 2SLS analysis is that returns to tertiary education are lower than returns to secondary education across all regions and wage quantiles. For example, in urban regions, workers with tertiary education earn significantly less than those with secondary education at the 75th and 90th percentiles. This finding contrasts with studies from developed economies (Blau et al., 2024; Lee and Choi, 2024), where tertiary education typically results in higher returns. However, this difference is expected, as the context in developed countries cannot be directly applied to Zambia. In Zambia, factors such as nepotism and social connections often influence access to high-paying jobs and subsequently reduce returns for tertiary-educated workers. Hence, merit and skills are not primary determinants of employment in Zambia.
The QR-IV 2SLS results also suggest that secondary education is sufficient for individuals in Zambia to realise the full economic benefits of education, as returns to tertiary education do not exceed those of secondary education. This observation also contrasts with findings from developed countries, where tertiary education consistently yields higher returns (Backman, 2014; Van Maarseveen, 2021). The lower returns to tertiary education in Zambia can be attributed to high unemployment rates and opaque recruitment processes, where individuals with higher qualifications often accept lower-paying jobs, thus diminishing the returns to their education.
6.5 Test of homogeneity in returns to education
Wald tests were conducted to assess whether regional differences in returns to education significantly differ from zero, using estimates from both the modified OLS human capital model and quantile regression. The OLS results showed statistically significant differences in returns to tertiary education between urban, peri-urban, and rural regions, revealing significant regional variations in returns to education in Zambia.
Wald test results from quantile regression estimates revealed significant variations in returns to education across quantiles, particularly for individuals with tertiary education, and were more pronounced between urban and rural regions. This finding highlights the diverse impact of education across income levels and is consistent with Issofou and Aicha (2021), who found similar Wald test results in Cameroon.
6.6 Robustness checks
To ensure the robustness of the findings and examine the potential impact of model specifications on results, two alternative models were implemented. Firstly, to verify that variations in regional returns to education were not influenced by the classification of education levels, the study employed a modified Mincerian OLS model using four sub-educational levels: primary, junior secondary, upper secondary, and Bachelor’s degree or higher. This approach, similar to Backman (2014), aimed to validate the findings and mitigate potential biases.
The results from the robustness alternative models reinforced the initial conclusions, indicating that higher sub-educational levels consistently yield higher returns across Zambia’s regions. Notably, urban regions had the highest returns for all sub-educational levels, followed by peri-urban regions. In contrast, the coefficients for all sub-educational levels were statistically insignificant in rural regions, emphasising the limited influence of education on income in these regions.
Secondly, to ensure that the study’s findings were not skewed by gender distribution, a modified human capital model was implemented by gender. The results confirmed the main conclusions: returns to education were highest in urban regions for both men and women, followed by peri-urban regions, and lowest in rural regions. These findings further reinforce the validity of both the OLS and quantile regression models used in this study, demonstrating the robustness and reliability of the study’s primary results.
7. Conclusion and policy recommendations
This study aimed to analyse the existence of regional heterogeneities in returns to education in Zambia. Specifically, it examined whether regional variations, health conditions, educational attainment, gender, and wage distributions influence returns to education. The findings indicate significant regional differences, with the highest returns observed in urban regions, followed by peri-urban, and the lowest in rural regions. Quantile regression analysis further highlights the heterogeneity in returns across income distributions within regions, suggesting the need for tailored policy interventions.
OLS estimates from the modified Mincerian model show pronounced heterogeneities across regions, education levels, and gender. The quantile regression results also demonstrate that returns to education increase from the lower to upper wage percentiles in all regions, with the largest disparities observed at the tertiary education level. Additionally, poor health was found to significantly reduce both earnings and returns to education, underlining the importance of integrating health considerations into human capital investment studies. Gender wage differentials were also notable, particularly widening at the top of the wage distribution, reinforcing the presence of glass ceiling effects.
Addressing regional disparities in returns to education in Zambia, and in similar developing economies, requires a multifaceted approach that goes beyond educational policy. In rural regions, where educational infrastructure is limited, improving access to schools and tertiary institutions is critical. This includes increasing funding, enhancing teacher training, and providing educational resources to underserved regions. These measures are important for narrowing the educational attainment gap and raising economic returns in these regions.
In urban regions, where returns to education are highest, policies should focus on aligning education and training programs with market demands. Developing tailored skills programs in high-growth sectors such as mining, manufacturing, and financial services will boost graduates’ employability and income potential, ensuring that education continues to provide strong economic returns.
The uneven distribution of economic opportunities across Zambia highlights the need for investment in industries beyond the urban core. Industrialisation in less developed regions will generate higher-paying jobs, increase returns to education, and support regional economic diversification, particularly in peri-urban and rural regions.
Enhancing access to quality healthcare, especially in rural regions, is vital for mitigating the negative effects of poor health on labour income. Health improvements can boost productivity and the overall returns to human capital investments.
Addressing gender wage disparities requires ensuring equitable promotion, hiring, and performance evaluation practices. Tackling the barriers faced by women, particularly in higher wage percentiles, is essential for breaking down the glass ceiling effect and achieving gender pay equity.
The QR-IV 2SLS analysis indicates diminishing returns to tertiary education, suggesting a need for policy interventions. The government should create a conducive business environment to promote alternative labour market participation, such as entrepreneurship, rather than further education, which appears to yield limited economic returns. Secondary education, in many cases, may be sufficient for maximising labour market outcomes.
In regions where education has a weaker influence on income, particularly peri-urban and rural regions, policy should shift toward investing in skills training centres. These centres can facilitate employment in sectors such as agriculture and mining, which offer higher income potential in these regions.
The authors would like to thank the African Economic Research Consortium (AERC) for the financial support rendered towards the publication of this paper.
