This paper aims to demonstrate how the level of data aggregation affects the estimation of the income elasticity of demand and potentially influences the distinction between luxury and necessary goods. We explore the estimation of the income elasticity of outpatient medicine demand in Brazil. Medicines may compete with expenditures on essential goods, which turns them into a frequent target of public policies worldwide.
We used a single data set (the Brazilian Household Budget Survey – BHBS, 2017–2018) with varying levels of data aggregation and an identical method of estimation to assess how the level of aggregation affects elasticity coefficient estimations.
The elasticity estimated based on per person microdata significantly differed from that estimated on a per capita basis by primary sampling unit and by State, but was similar to per capita estimates by household.
Although many academic studies still use aggregate data to estimate associations between variables that represent individual characteristics or choices, this procedure is incorrect and can lead to inconsistent results. Data aggregation level matters when it comes to understand and estimate demand elasticity. Elasticities based on aggregate data provide overestimations unfit both for crafting public policies and for projecting future demand.
Introduction
There is a large body of international literature on the demand for health goods and services (Dubey, 2020) and on determinants of health expenditures (Andersen & Newman, 1973; Astolfi, Lorenzoni, & Oderkirk, 2012). Part of this literature specifically addresses the elasticity of health product demand. Demand elasticity changes are caused by variations in the costs of accessing products (the so-called price elasticity) or by variations in the buyer/user’s income (the income elasticity).
Conceptually, income elasticity is the percentage change in the demand for a good or service associated with a percentage change in the income of people in a given region. An income elasticity below 1 indicates that health expenditures are “inelastic” with respect to income, i.e. they are “necessary” goods, with little dependence on income levels. Elasticities above 1 would indicate that the demand is strongly influenced by income (Di Matteo, 2003), i.e. that increases in income lead to more than proportional increases in the product’s demand, which leads products with this profile to be called “luxury” goods.
Elasticity estimates are used in expenditure projections for different public policy scenarios (Duarte, 2012; Cruz Rivero, Luna Ruiz, Morales Barrera, & Coello Levet, 2006). Whenever health goods and services prove to be “necessary” rather than “luxury” goods, greater involvement of governments to provide these goods to the entire population is warranted (Costa-Font, Gemmill, & Rubert, 2011). On the other hand, if health goods and services are found to be “luxury” products as some studies suggest (Barati & Fariditavana, 2020), a point is made against universal public access, since “luxury” goods, in theory, should be financed by household income.
Whether healthcare is a luxury good or a necessity has long been considered an unsettled issue (Astolfi et al., 2012). The first reference work on income elasticity of demand in healthcare was published by Newhouse (1977). Elasticities estimated with aggregate data for health goods and services are generally higher than those estimated with microdata (Getzen, 2000). Despite the great heterogeneity in estimates found in international literature, the trend since then has favored income elasticities above 1 (Costa-Font et al., 2011), implying that health goods and services are luxury products (Newhouse, 1977; Gerdtham, Sogaard, Andersson, & Jonsson, 1992).
In Brazil, the first appointments on healthcare income elasticity were authored by Musgrove (1984). In a comprehensive work on family expenditure on healthcare in Latin America using aggregate data, Musgrove estimated an income elasticity of 1.2 for Brazil. Long-term health expenditure projections to support public policies (Rocha, Furtado, & Spinola, 2021) were based on this and other more recent income elasticity estimates, with evident implications for the demand projections.
As dissenting voices, Costa-Font et al. (2011) used disaggregate data for several health products and found income elasticities below 1. These authors point out that, in view of the potential implications of elasticity estimates for resource allocation, it is essential to discuss their precision and to strengthen the debate around their methodological robustness (Costa-Font et al., 2011). They specifically question the use of aggregate data in elasticity estimation.
The relevance of aggregation biases in healthcare research cannot be overstated. Geissbühler et al. (2021) carried out a meta-epidemiological study of 81 medical studies, including aggregate data meta-regression analysis for the years 2002 to 2012. They found 65% of them were susceptible to aggregation bias, turning this into the most frequent methodological pitfall for health sector studies.
The aim of this paper is to illustrate how the level of data aggregation affects income elasticities of demand estimations. This leads, as demonstrated in our findings, to increasing overestimation the more aggregate the data used. To this end, we use the example of the outpatient consumption of medicines in Brazil, of which over 80% is financed by direct out-of-pocket (OOP) spending by households (Moraes, Santos, Vieira, & Almeida, 2022). A smaller consumption share comes from public resources (non-monetary consumption) and there is near-zero coverage by private health insurance (Moraes et al., 2022).
Globally, OOP expenditures for medicines exceed 40% of total pharmaceutical expenditure in Europe and reach much higher levels in low- and middle-income countries. Such expenditure may be unexpected and high or be needed on a continuous basis for chronic conditions. In this sense, they may compete with expenditures on essential goods, such as food and housing, and are a frequent target of public policies (Żółtaszek, 2022). In Brazil, previous work has highlighted the importance of free dispensing of medicines for chronic conditions in reducing avoidable hospitalizations and mortality (Almeida, Sá, Vieira, & Benevides, 2019). They also highlight the burden of medicine expenditure for groups such as the elderly (Lima, Ribeiro, de Assis Acurcio, Rozenfeld, & Klein, 2007) and low-income families, for whom medicines are the main health-related expense (Menezes, Campolina, Silveira, Servo, & Piola, 2007).
Detailed data on the outpatient consumption of medicines is available in the Brazilian Household Budget Survey (BHBS) (IBGE, 2019). In addition to describing how much each resident spends on medicines, the BHBS also collects very detailed data on household income. The simultaneous availability of expenditure and income data in the same database turns it into an ideal source for demand estimations.
Our estimation with per person data using the latest BHBS resulted in an income elasticity of 0.41 for medicines (with a 95% confidence interval between 0.39 and 0.42). This is significantly different from the 0.48 (0.46–0.50) and 0.68 (0.60–0.76) elasticities, estimated with per capita data by Primary Sampling Unit and by State. Per capita data by household led to a 0.42 (0.40–0.44) elasticity, not significantly different from the individual data estimate.
Revisiting the theoretical background
Aggregate data are often used to estimate income elasticities of demand. This approach to estimation assumes that the demand for a product in a given period of time equals its consumption in the immediately preceding period multiplied by the variation in the region’s income and by the elasticity – an average relation between the variation in aggregate income and the variation in total product consumption over time (or between regions) – as shown in Equation (1).
However, the use of aggregate data to describe individual characteristics and decisions (as the decision to use a medicine) leads to inconsistencies that have been pointed out by statisticians for a long time.
The inconsistencies generated when one estimates associations between individual-level variables using aggregate data were originally pointed out by English statistician George Yule at the beginning of the 20th century. In his paper, Yule (1903) presented the hypothetical example of an antitoxin that, in an analysis with aggregate data, appeared to be an effective treatment for reducing patient mortality. But, upon analyzing data for men and women separately, the antitoxin's effect disappeared. The effect disappeared because this third variable (sex) was associated with the two other categorical variables under analysis (taking the drug and living longer). As women have, on average, a longer life expectancy, if the proportion of women taking the medicine is greater than that of men, the contingency table with mixed data from men and women will indicate a positive association between taking the drug and living longer. This association will be simply due to the higher proportion of women in the sample taking the medicine. In Yule’s example, the drug was ineffective but, on an aggregate analysis, it appeared to be effective.
Simpson (1951), also a statistician, ended up having his name associated with the Simpson’s Paradox. The Simpson’s Paradox occurs when a contingency table initially indicates a positive association between two variables, but, upon using disaggregated data, the association turns negative. Presenting associations derived from aggregate data as though they represent associations between these variables for individuals in the aggregates is precisely what became known as an Ecological Fallacy (Selvin, 1958).
French economist André Nataf made a similar exercise applied to production functions (Nataf, 1948). A production function amounts to the relationship between inputs and the output in a firm (or a set of firms). Nataf’s idea was to define criteria to support the use of aggregate data to estimate production functions. For example, in an industry with I firms, the production of each firm (yi) would be related to the inputs it uses to produce, with x1i, x2i, x3i, etc. being the values spent by firm i on electricity, paper, fuel, etc. Thus, the aggregate production function will be given by total production (Y) and the inputs (X) used by all the firms in this industry, as shown in Equation (2), where capital letters represent the aggregates:
Nataf19 showed that aggregate data could provide consistent results if the terms of the production function were mathematically independent, a condition which became known as Nataf’s Theorem. Independent terms, in Nataf’s Theorem, mean that the inputs cannot be correlated with each other. The function terms (for each firm and for the aggregate function) must be additive: changes in the quantity of an input used (x1) cannot affect the relations between the others (between x2 and x3, for example).
The same logic used by Nataf for production applies to demand. In this case, one just needs to replace consumers with firms in Equation (2) and think of demand as y and consumer characteristics and preferences as the x variables. In most cases, increases in income for consumers with different profiles will have different effects on demand. In the most common example, high-income consumers, who will save part of their increased income, will use less of their increased income to consume. Low-income consumers, with a consumption pattern concentrated on basic needs, are likely to consume any additional resources they receive. If the income increase occurs only for high-income consumers, the effect on consumption will be smaller than if it occurs for low-income consumers.
Thus, to suppose that aggregate demand is a linear function of aggregate income (i.e. that it always grows in the same proportion when total income grows) is frequently an oversimplification. Aggregate models – which estimate relations from total income and total consumption of a product – can generate misleading estimates for the relation between variations in income and consumption (the income elasticity of demand). Models using per capita income and consumption are not much better: the per capita value is just the aggregate total divided by the population to which it refers.
Methods
We built a model to estimate income elasticities using data at different levels of aggregation – per person, per capita (household), per capita (Primary Sampling Units – PSUs) and per capita (State). PSUs are subdivisions of the territory (census sectors) that are drawn at each stratum (State or metropolitan region) as part of a survey’s sample selection. A sample of households is randomly selected to be surveyed in each PSU selected.
Our data came from the 2017–18 BHBS, which visited 57,920 households between July 11th, 2017 and July 9th, 2018. The BHBS collects data on consumption, income and living conditions throughout the country, providing representative samples for each State and for metropolitan areas. To avoid seasonality problems, data are collected throughout the whole year (IBGE, 2019).
To build our data set for consumption, we used the BHBS4 questionnaire (Individual Acquisitions) section 29A (Acquisition of Pharmaceutical Products), which collects reported values for pharmaceutical products consumed by each household resident in the last 30 days before the interview (IBGE, 2019). The BHBS database includes data on monetary and non-monetary consumption, depicted in the variable V8000-DEFLA. Non-monetary consumption amounts to consumption linked to donations, goods and services provided by the government or other forms of acquisition not involving direct payments from individuals. A previous study showed that values informed for both monetary and non-monetary acquisitions of medicines in BHBS were very consistent (Moraes et al., 2022). This allowed us to include both monetary and non-monetary consumption in our dataset. Thus, potentially pooled consumption based on government provision was translated into individual consumption.
To build our income dataset, we used the pre-tabulated variable RENDA_TOTAL provided in the BHBS database for total household income reported in the BHBS5 questionnaire – work and individual income. Total household income captures income for residents of any given household of any origin, be it from work or other sources.
The weights (PESO_FINAL) in the per person database show the number of persons represented by each sampled observation (surveyed person). Weights assign an expansion factor to each household in the primary sampling unit and are used both for income and consumption databases for all levels of observation.
A synthesis of procedures for building the study’s database for consumption and income by aggregation level is provided in Table 1. The final data set used in the study will be provided upon request to the authors.
Database structure for consumption and income derived from Brazilian Household Budget Survey (BHSB) variables by aggregation level
| Aggregation level | Consumption | Income | Weights | Comments |
|---|---|---|---|---|
| Per person | BHSB variable Consumption (V8000_DEFLA) in the Individual Acquisitions survey questionnaire (POF 4) for Pharmaceutical Products (section 29A). This depicts monetary and non-monetary acquisitions reported on pharmaceutical products for each person surveyed, reporting acquisitions for these products | BHBS variable Total household income (RENDA_TOTAL) divided by the number of residents in the household and assigned to each person in the household | BHBS variable for Weights (PESO_FINAL) | Since children and other residents may have consumption patterns that are more related to their home’s average than to their individual income, we opted to use each household's average per capita income in the individual analysis database |
| Per capita (Household) | BHSB variable Consumption (V8000_DEFLA) in the Individual Acquisitions survey questionnaire (POF 4) for Pharmaceutical Products (section 29A) was aggregated by household code and then divided by the number of residents in each household | BHBS variable Total household income (RENDA_TOTAL) divided by the number of residents in the household | BHBS variable for Weights (PESO_FINAL) | In BHBS, the weight variable for all residents in the individual characteristics database equals their household's weight. There was no need to adjust weights for the regression at this level of aggregation |
| Per capita (Primary Sampling Unit - PSU/Census Sector) | Starting from our per person database, we aggregated the BHSB Consumption variable (V8000_DEFLA) for Pharmaceuticals by individual PSUs: ∑(sampling weights × consumption value). We then divided the result by the number of residents in the corresponding PSUs, obtained in ∑(sampling weights) | Starting from our per person income database, we aggregated the Income variable (RENDA_TOTAL) by individual PSUs: ∑(sampling weights x income value) and divided this result by the number of residents in the corresponding PSUs, obtained in ∑(sampling weights) | Sum of per person weights in each PSU | The weights in the per person database show the number of persons represented by each sampled observation (surveyed person). Their sum by PSU shows the population in each PSU |
| Per capita (State) | Starting from our per person database we aggregated the BHSB Consumption variable (V8000_DEFLA) for Pharmaceuticals for each State: ∑(sampling weights × consumption value) and divided the result by the number of residents in the corresponding State, obtained in ∑(sampling weights) | Starting from our per person database we aggregated the Income variable in each State: ∑(sampling weights × income value) and divided this result by the number of residents in the corresponding state, obtained in ∑(sampling weights) | Sum of per person weights in each State | The weights in the per person database show the number of persons represented by each sampled observation (surveyed person). Their sum by state shows the population in each State |
| Aggregation level | Consumption | Income | Weights | Comments |
|---|---|---|---|---|
| Per person | BHSB variable Consumption (V8000_DEFLA) in the Individual Acquisitions survey questionnaire (POF 4) for Pharmaceutical Products (section 29A). This depicts monetary and non-monetary acquisitions reported on pharmaceutical products for each person surveyed, reporting acquisitions for these products | BHBS variable Total household income (RENDA_TOTAL) divided by the number of residents in the household and assigned to each person in the household | BHBS variable for Weights (PESO_FINAL) | Since children and other residents may have consumption patterns that are more related to their home’s average than to their individual income, we opted to use each household's average per capita income in the individual analysis database |
| Per capita (Household) | BHSB variable Consumption (V8000_DEFLA) in the Individual Acquisitions survey questionnaire (POF 4) for Pharmaceutical Products (section 29A) was aggregated by household code and then divided by the number of residents in each household | BHBS variable Total household income (RENDA_TOTAL) divided by the number of residents in the household | BHBS variable for Weights (PESO_FINAL) | In BHBS, the weight variable for all residents in the individual characteristics database equals their household's weight. There was no need to adjust weights for the regression at this level of aggregation |
| Per capita (Primary Sampling Unit - PSU/Census Sector) | Starting from our per person database, we aggregated the BHSB Consumption variable (V8000_DEFLA) for Pharmaceuticals by individual PSUs: ∑(sampling weights × consumption value). We then divided the result by the number of residents in the corresponding PSUs, obtained in ∑(sampling weights) | Starting from our per person income database, we aggregated the Income variable (RENDA_TOTAL) by individual PSUs: ∑(sampling weights x income value) and divided this result by the number of residents in the corresponding PSUs, obtained in ∑(sampling weights) | Sum of per person weights in each PSU | The weights in the per person database show the number of persons represented by each sampled observation (surveyed person). Their sum by PSU shows the population in each PSU |
| Per capita (State) | Starting from our per person database we aggregated the BHSB Consumption variable (V8000_DEFLA) for Pharmaceuticals for each State: ∑(sampling weights × consumption value) and divided the result by the number of residents in the corresponding State, obtained in ∑(sampling weights) | Starting from our per person database we aggregated the Income variable in each State: ∑(sampling weights × income value) and divided this result by the number of residents in the corresponding state, obtained in ∑(sampling weights) | Sum of per person weights in each State | The weights in the per person database show the number of persons represented by each sampled observation (surveyed person). Their sum by state shows the population in each State |
To organize a database containing BHBS individual (per person) income data, we divided total household income by the number of residents in the household. Since children and other residents may have consumption patterns that are more related to their home’s average than to their individual income, we opted to use each household’s average per capita income in the individual analysis database.
For the per capita analysis on households, medicine consumption values were aggregated by household codes and then divided by the number of residents in each household. In BHBS, the weight variable for all residents in the individual characteristics database equals their household's weight. There was no need to adjust weights for the regression at this level of aggregation.
To obtain per capita consumption values by Primary Sampling Unit (PSU), we aggregated consumption per person and household per capita income in the individual (per person) database and applied Equation (3) to the data of the j residents of each PSU. The same was done in the aggregations by State.
The usual way to estimate income elasticity of demand is by linear regression based on logarithms of consumption and income. In the case of microdata, a limitation to be considered is the existence of observations with zero value, i.e. individuals not reporting medicine consumption in the preceding 30 days. This would mean that, from the third level of aggregation onwards, there would be no zeroes amongst the data points. To adopt the same procedures for estimates in all levels of aggregation, we opted to use only data from individuals who reported medicine consumption in the preceding 30 days, discarding observations with zero consumption.
Per person, per capita by household, per capita by PSU and State level aggregations were then used in linear regressions with the logarithm of consumption and the logarithm of income, producing estimates for the income elasticity of demand at the four levels of aggregation. The linear regressions were weighted by the original BHBS weights for individuals and households and by the sum of the weights of the individuals in each PSU or State in the regressions for the corresponding levels of aggregation. Estimates of the coefficients with 95% confidence intervals were calculated for each case.
For individuals and per capita by household, variance estimations considered the sampling design, using the survey package in R software (R DEVELOPMENT CORE TEAM). For State and PSU per capita regressions, the sampling design was not used: design variables are lost when data are aggregated. In PSU and State aggregations, variance estimation assumed simple random samples. Once the weights are included in the regression (as they were, for the four regressions), the other sampling design specifications affect variances but not the regression coefficients (Särndal, Swensson, & Wretman, 1992; Lumley, 2010).
To check for the effect of other kinds of aggregation, the totals per household were regressed using the same regression specification described above and the results were compared to the per capita per household regression. In this case, there is one more variable interacting with consumption and income: the number of household residents.
Results
The per person income elasticity of demand for medicines estimated from microdata – 0.41 (0.39–0.43) – significantly differed from estimates for per capita data by PSU – 0.48 (0.46–0.50) – and by State – 0.68 (0.60–0.76) – but was very similar to per capita estimates by household – 0.42 (0.40–0.44). There was a difference of more than 60% between per person and state aggregate elasticities.
Data aggregation, even more so in countries with highly unequal income distribution patterns such as Brazil and other LMIC, “smoothens out” income heterogeneity. It understates extremes of low income and need, which would demand specific public policies, such as higher shares of public financing, compensatory policies or tougher market regulation.
Table 2 shows the estimated elasticities for the four levels of data aggregation and corresponding 95% confidence intervals. Estimated elasticity and the regression’s coefficient of determination (R2) increase with the level of aggregation. The last line on the table shows the sample size for each level of aggregation.
Elasticity estimated at different levels of aggregation
| Coefficients/Models | Per person | Per capita (household) | Per capita (UPA) | Per capita (State) |
|---|---|---|---|---|
| Intercept | 0.89 (0.77–1.00) | 1.28 (1.15–1.41) | 0.90 (0.76–1.04) | −0.56 (−1.18–0.06) |
| Elasticity | 0.41 (0.39–0.43) | 0.42 (0.40–0.44) | 0.48 (0.46–0.50) | 0.68 (0.60–0.76) |
| R2 | 0.09 | 0.10 | 0.32 | 0.91 |
| Sample (n) | 76.979 | 50.515 | 5.498 | 27 |
| Coefficients/Models | Per person | Per capita (household) | Per capita (UPA) | Per capita (State) |
|---|---|---|---|---|
| Intercept | 0.89 (0.77–1.00) | 1.28 (1.15–1.41) | 0.90 (0.76–1.04) | −0.56 (−1.18–0.06) |
| Elasticity | 0.41 (0.39–0.43) | 0.42 (0.40–0.44) | 0.48 (0.46–0.50) | 0.68 (0.60–0.76) |
| R2 | 0.09 | 0.10 | 0.32 | 0.91 |
| Sample (n) | 76.979 | 50.515 | 5.498 | 27 |
Note(s): Estimates of the coefficients with 95% CI in models with different aggregation options: per person, per capita (household), per capita (PSU) and per capita (State)
Figure 1 shows per person, household, PSU and State per capita logarithms of income and medicine consumption. The more aggregate data – for per capita PSU and States – show less dispersion than disaggregate per person and household per capita data. The lower dispersion is associated with a higher R2. Non-random aggregations obtained by adding data from individuals in the same region change the slope of the regression line representing the income elasticity of demand.
The horizontal axis in all plots is labeled “log (income domic per capita)” and ranges from 0 to 15 in increments of 4 units. The vertical axis in all plots is labeled “log (consumption)” and ranges from negative 2 to 10 in increments of 4 units. The top-left plot is titled “Per person,” the top-right “Per capita (household),” the bottom-left “Per capita (P S U),” and the bottom-right “Per capita (State).” The scatter points are most dispersed in the “Per person” and “Per capita (household)” plots, representing individual and household data. The scatter points become progressively more tightly clustered around the regression line as the data is aggregated at the “Per capita (P S U)” (primary sampling unit) and then “Per capita (State)” levels. Note: All numerical values are approximated.Medicines consumption and income at different aggregation levels. Scatter plot of the logarithms of medicine consumption and income per person and per capita, based on aggregate data by household, primary sampling unit (PSU) and State. The red line indicates the income elasticity of demand, given by the regression line. Source: Prepared by the authors based on data from the BHBS 2017–18 for individuals consuming medicines in the surveyed period
The horizontal axis in all plots is labeled “log (income domic per capita)” and ranges from 0 to 15 in increments of 4 units. The vertical axis in all plots is labeled “log (consumption)” and ranges from negative 2 to 10 in increments of 4 units. The top-left plot is titled “Per person,” the top-right “Per capita (household),” the bottom-left “Per capita (P S U),” and the bottom-right “Per capita (State).” The scatter points are most dispersed in the “Per person” and “Per capita (household)” plots, representing individual and household data. The scatter points become progressively more tightly clustered around the regression line as the data is aggregated at the “Per capita (P S U)” (primary sampling unit) and then “Per capita (State)” levels. Note: All numerical values are approximated.Medicines consumption and income at different aggregation levels. Scatter plot of the logarithms of medicine consumption and income per person and per capita, based on aggregate data by household, primary sampling unit (PSU) and State. The red line indicates the income elasticity of demand, given by the regression line. Source: Prepared by the authors based on data from the BHBS 2017–18 for individuals consuming medicines in the surveyed period
A last regression was performed with the logarithms of total consumption and income per household. In this case, the estimated elasticity was 0.52, with a confidence interval of (0.50–0.54) and R2 of 0.13. This elasticity is significantly higher than that estimated using per capita per household data. In this case, consumption and income are related to the number of residents in each household. Households with more residents have, on average, higher consumption and income.
As the coefficient of determination increases with data aggregation, in addition to leading to higher elasticity coefficients, aggregate data (providing a smaller number of observations and less dispersion in data points) appear to generate regressions with greater ability to project demand. But far from depicting a more precise representation of associations between income and consumption, this simply shows that the residuals (and part of the information contained in the data) are averaged out and discarded by the aggregation. Elasticities greater than 1 and the high R2 found in estimates with aggregate data can, thus, be a spurious effect of data aggregation.
Discussion
We have estimated income elasticities using person-level data and different levels of aggregation with the same dataset. Income elasticities spuriously increase with data aggregation. The originality of this study relies on using the Brazilian setting to make this point, namely, the caution in using aggregated data to estimate demand elasticity. In the United States, Lubiani, Okunade, and Chen (2018) used multiple databases with different levels of aggregation and, similarly, found elasticities of 0.647 for medicines, using more aggregate data, and of 0.167, with less aggregate data.
Many studies use aggregate data such as GDP to estimate elasticities. Baltagi, Lagravinese, Moscone, and Tosetti (2016) used per capita GDP as the income variable and identified higher income elasticities for poorer countries. Based on results obtained with aggregate data, they argued that elasticity “is likely to depend on the context and the level of richness of nations, with many goods and services turning from luxury to necessity as income rises.” Their explanation for this counter-intuitive finding was that rich countries have better-organized health systems, which set higher standards of needs, whereas poorer countries do not have a clear understanding of national health priorities, turning healthcare into a basket of acute care contingent luxury goods.
Getzen (2000) published a review of articles on income elasticity of demand for health goods and services and confirmed that estimates using aggregate data are higher than those based on microdata. However, he did not conclude that the use of aggregate data leads to inconsistent results. Getzen interpreted aggregate data as meaningful representations of the impact of decisions taken at various levels of aggregation. Based on this interpretation and the corresponding elasticity results, he deemed healthcare to be an individual necessity (elasticity results below 1) and as a national luxury (elasticity results at 1 or above).
For Getzen (2000), the higher elasticities would be explained by the influence of health insurance pooling on the allocation of resources and could not be viewed as a spurious aggregation bias. They would reflect a level of analysis where decisions are made on an aggregate level (private insurance). Our example on medicine consumption, however, shows that the increase in elasticity seen when data are aggregated is not explained by a health insurance pooling, as private health insurance coverage for medicines was near zero in Brazil (Moraes et al., 2022).
We argue that elasticity measures based on any level of data aggregation display the effects of the statistical bias of aggregation, and are therefore incorrect. In our view, the demand for health products is always and ultimately an individual decision. These individual choices can lead to catastrophic spending (spending that far exceeds income, compromising consumption capacity) and to various levels of elasticity, which demand specific public policy approaches. Cruz Rivero et al. (2006) and Costa-Font et al. (2011) were explicitly critical of the possibility of making inferences about individual behavior from aggregate data, as “individual level budget constraints differ from those at the regional or national level” (Costa-Font et al., 2011).
Although part of the resources used to pay for health services may in fact come from health insurance pooling, this does not eliminate the problem of data aggregation for out-of-pocket consumption in any given sample. In addition, voluntary health insurance premium expenditures are evidently individual health-related expenditures. People who pay for private health insurance – whether in the form of prepayments, deductibles or co-payments – are different from those who do not (Moraes, Santos, Werneck, De Paula, & Almeida, 2022). They have higher average income, distinctive age profiles and greater access to healthcare services. There is an association at the individual level between income and spending on health insurance and services. Individual characteristics of those who pay for private health insurance are lost when one estimates the income elasticity using aggregate data. Thus, even in a context of resource pooling, relationships between income and health spending (including health insurance premium outlays) can display inconsistent results when based on aggregate data.
The aggregation problem is even broader. People’s individual well-being perceptions – which generate variations in health needs and demands (Dubey, 2020; Huttin, 1997) – and the specific effects of income on the consumption of different health products need to be considered in the analyses (Cruz Rivero et al., 2006; Barati & Fariditavana, 2020). Income elasticity may also vary for expenditures of distinct magnitudes or in specific disease episodes, which may require anything from simple basic care to intensive care treatment. Therefore, ideally, one should estimate income elasticities for the different health products.
Barati and Fariditavana (2020) estimated the income elasticity of demand for 12 types of health products in the United States using time series with per capita GDP as the income variable. They found that most income elasticities for healthcare products are below 1, except for Home care and Public health. The authors concluded that both are luxury products, and had “more of a calming effect and would provide comfort and convenience rather than actual physiological treatment.” Consequently, they “should be left to market forces,” not warranting government intervention. Public health stands for promotion and prevention interventions, which are an integral part of the WHO universal health coverage proposals (WHO & WB, 2023). For individuals facing pressing needs and budgetary restrictions, such products would not be considered priorities and thus become unmet needs not captured in elasticity measures. Home care would equally not be part of a regular consumption basket in low-income households. In both cases, it does not follow that they should be left to market forces. Such hurried conclusions illustrate the potentially deleterious recommendations arising from the use of aggregate data for elasticity estimation. In most contexts, it makes little sense to label healthcare as a “national luxury.”
The effect of projecting demand based on elasticities estimated from aggregates is to overestimate it. Overestimated growth tendencies in demand are also used to inflate pharmaceutical companies’ and private hospitals’ share prices. The unbiased estimation of elasticities can show that, despite its growth, the increase in demand may not be “explosive” in the long run and can be financed as society’s income grows.
Our study has some limitations. The model used only data from individuals obtaining pharmaceuticals in the 30 days prior to the interview. It did not consider the lower probability of buying medicines associated with lower income and, therefore, does not provide ideal elasticity measures. The treatment of this effect for the various aggregation levels would lead to higher estimations for the elasticities. But it would also lead to methodological differences in the way data were treated at different aggregation levels, making comparisons unfit for our purposes. The low R2 in regressions with disaggregate data also makes these models unsuitable for projections, as they explain a small part of consumption variations. However, the goal of our estimates was not to make projections but to apply the same method to data with different levels of aggregation and study the significance of the aggregation effect.
Not exploring conditions that would allow for unbiased estimation using aggregate data by using formal models could be considered a limitation and a potential future development of this work. However, given the high prevalence of methodological flaws due to aggregation bias in healthcare research (Geissbühler et al., 2021), we opted at this point to focus on providing a practical example and its potential impacts on public policy decisions.
Conclusion
The varying approaches to the estimation of relationships between income and demand for healthcare and the use of aggregate data to this end recommend caution in estimations and in uses given to elasticity measures. What can be said with certainty is that, when an analyst uses aggregates instead of individual data, he may be producing inconsistent estimates on account of an ecological fallacy. An ecological fallacy is a logical error occurring when someone interprets variations in environmental settings (aggregates or groups) as variations among individuals. Data aggregation level matters when it comes to understand and estimate demand elasticity. Wrongly overestimating it has potential implications for policy decisions, especially in critical markets such as healthcare.
Only elasticity measures based on individual data should be used. Elasticities based on aggregate data are unfit both for crafting public policies and for projecting future demand. Even more so in countries with highly unequal income distribution patterns, such as Brazil, data aggregation “smoothens out” income inequalities. It understates extremes of low and high income and need, which would demand higher shares of public financing, compensatory policies or tougher market regulation.
Thus, the effect of projecting demand based on elasticities estimated from aggregates can be to overestimate it and, voluntarily or not, provide questionable arguments for proponents of a lesser role of the state in the supply of health goods and services.

