Instrumental identification of interventional distribution via moment recovery in a parametric outcome model

Sejdinovic, Dino

doi:10.1108/AJEB-08-2025-0099

Purpose

Many economic and policy decisions require the full interventional outcome distribution (e.g. tails, dispersion and asymmetry), not just average treatment effects. We study cases where that distribution is identifiable in instrumental variables (IV) settings with endogenous treatment and latent confounding. We focus on coupling additive mean structure with parametric outcome model assumptions to enable estimation of distributional treatment effects.

Design/methodology/approach

We posit outcome families that admit summary features that can recover confounder moments. A parsimonious distribution (e.g. discrete or common parametric families) can then be fit to match those moments and combined with the posited outcome model to construct the interventional distribution. Examples include Gaussian outcomes with multiplicative noise, Poisson counts and Gamma outcomes.

Findings

The moment-recovery scheme accurately reconstructs the low-order moments of the confounder and the interventional distribution in simulations, including a skewed confounding with multiplicative noise.

Originality/value

The paper provides a transparent pathway from the classical IV assumptions to the full identification of the interventional distribution, clarifying the feasibility conditions and offering a simple, practical workflow.

1. Introduction

Instrumental variables (IV) are a cornerstone of empirical economics because they can recover causal effects when randomized experiments are infeasible and observational relationships are confounded. The method has its roots in simultaneous equations for supply and demand, where “curve shifters” were used to trace out structural relationships that ordinary least squares could not identify (for example, weather-driven yield as a supply shifter and substitute prices as a demand shifter). IV can also be viewed through the lens of “natural experiments” that exploit institutional rules to generate quasi-random variation. A classic illustration uses date of birth and compulsory-schooling laws to instrument schooling: quarter-of-birth rules shift schooling exogenously and produce reduced-form “blips” in both schooling and earnings that track the policy-driven design rather than smooth age trends (Angrist and Krueger, 2001).

While IV is widely used to recover average treatment effects, many economic questions are distributional. Policymakers and firms often care not only about how a policy shifts the mean of outcomes but also about dispersion, tails, quantiles or other features of the interventional distribution. For instance, a pension policy might leave average savings roughly unchanged while compressing the lower tail or widening the upper tail—differences with direct welfare and risk implications. With randomized data, distributional causal effects can be studied by quantile or distributional regression; however, under hidden confounding, the problem is substantially harder. Such questions commonly arise in applied microeconometrics: in development, labor and household/banking finance, where IV designs are pervasive. Recovering further aspects of the full interventional distribution, beyond its mean, is directly meaningful for risk management, inequality analysis and policy targeting in these contexts.

Building on this motivation, we study a setting of the interventional distribution recovery using instrumental variables in certain parametric outcome models. IV problems are, in general, ill-posed inverse problems: in observational data the interventional law is not directly observed and must be reconstructed from instrumented variation by making transparent structural assumptions. In particular, we will develop a framework where parametric assumptions on the outcome law as well as on the marginal law of the confounder allow us to recover the interventional law. There are also two broader active strands of research that use approaches complementary to the one described here to go beyond the average treatment effects: (1) distributional IV methods that combine instruments with rank-type assumptions to identify interventional CDFs for binary treatments (Kook and Pfister, 2024), and (2) nonparametric methods based on kernel embeddings and operator identities, which impose different types of structural assumptions (Sejdinovic, 2025).

2. Background

2.1 Notation and object of interest

We observe triplets (Z, T, Y), where Z is an instrument, T is a treatment or exposure (possibly continuous) and Y is an outcome. Unobserved variables U are confounders: they simultaneously affect T and Y. We write p(y∣do(T = t)) for the interventional distribution of Y (either a probability density function or a probability mass function, depending on whether the outcomes are discrete or continuous). The interventional distribution is induced by externally setting T to t. In contrast, p(y∣T = t) denotes the observational conditional distribution and will generally differ from the interventional target when T is endogenous.

In a structural equation model with Y = g(T, U, ɛ_Y) for some structural function g and a noise term ɛ_Y independent of all else, the interventional distribution can be written as

p (y ∣ d o (T = t)) = \int p (y | t, u) p_{U} (u) d u,

(1)

where p(y∣t, u) is the outcome law conditional on (T = t, U = u) and p_U the marginal law of U. Since U is unobserved, identification of (1) from observational data requires assumptions about how the instrument Z is related to (T, U, Y).

2.2 IV assumptions and basic estimands

We will start by imposing the standard IV conditions, which ensure that variation in T induced by Z can be used to estimate interventional quantities.

A1.
Relevance. The instrument Z does have a causal effect on the treatment T, i.e. it “shifts” the treatment.
A2.
Exclusion restriction. Conditional on T, Z affects Y only through T, i.e. there is no direct causal effect of Z on Y.
A3.
Instrumental unconfoundedness. The instrument is independent of the confounder U (and the noise term additionally affecting Y), i.e. Z ⊥ (U, ɛ_Y).

These conditions, however, do not allow nonparametric identification, and we require additional structure, typically expressed as additivity of the conditional expectation $E [Y | T, U]$ ⁠.

A4.
Additivity. $E [Y | T, U] = μ (T) + U$ ⁠.

Thus, μ(t) captures the average treatment effect (ATE) $E [Y | d o (T = t)]$ (up to an additive constant), which is typically the relationship of interest in IV settings. The instrumental regression (Darolles et al., 2011) corresponds to solving the functional equation

E [Y - μ (T) | Z] = 0 .

In the simplest setting where all variables are scalar and we assume a linear model, i.e. μ(T) = cT we can recover ATE using the familiar expression c = Cov(Y, Z)/Cov(T, Z). For general linear Gaussian settings, one derives analogous estimators using cross-covariance matrices, and in the nonparameteric settings, several approaches are available, including Nadaraya-Watson regression (Darolles et al., 2011), kernel methods (Singh et al., 2019) and deep neural networks (Hartford et al., 2017). In this work, we will assume that one has already constructed a consistent estimator $\hat{μ} (T)$ of μ(T), and we will consider how to use $\hat{μ} (T)$ to develop an estimator of the interventional distribution in (1).

2.3 Problem statement

Our goal is to recover the interventional distribution p(y∣do(T = t)) in settings where T is endogenous and U is unobserved. This is an ill-posed inverse problem in general: observable relationships, i.e. conditional expectations and moments that can be estimated while leveraging the instrument, must be linked to the target p(Y∣do(T = t)). In unrestricted nonparametric models, such recovery is generally impossible. Progress requires assumptions that (1) are substantively meaningful in economic applications and (2) render the inverse problem well-posed enough to be estimable.

2.4 Related work

We briefly mention two strands of related work. First, work on distributional IV with binary treatments (Kook and Pfister, 2024) identifies interventional CDFs for treated and control states by pairing IV with strengthened relevance and rank-based assumptions (e.g., rank invariance, rank similarity or conditional variants). Estimation proceeds by searching for CDFs whose implied probability-integral-transform residuals are both uniform and independent of the instrument, after which distributional effect measures (quantile effects, stochastic dominance and related functionals) are plugged in. Second, a complementary literature (cf. Sejdinovic (2025) and references therein) develops kernel embedding methods that represent probability laws in reproducing-kernel Hilbert spaces (Muandet et al., 2017) and express causal and interventional quantities via linear operators (conditional mean operators and their “deconditioning” counterparts). These approaches provide a unified nonparametric route to causal means and, more broadly, distributional treatment effects, at the cost of additional structural conditions. In particular, Sejdinovic (2025) discusses additive decomposition in a feature space, i.e. strengthening assumption (A4) to $E [k (\cdot, Y) | T, U] = μ (T) + h (U)$ for a given reproducing kernel k, where $μ (t) = E [k (\cdot, Y) | d o (T = t)] \in H_{k}$ is the interventional mean embedding and $h (U) \in H_{k}$ maps the confounder into the reproducing kernel Hilbert space $H_{k}$ ⁠. This condition may be unrealistic as k(⋅, Y) typically captures higher-order features Y, and we require additivity for all such features. Our main focus here is to consider assumption (A4) as is, but to instead determine specific outcome models on Y for which practical methods for the interventional distribution recovery are possible.

3. Models and methods

This section formalizes the structural equation models (SEMs) we study and develops a general framework for recovering interventional distributions when the latent confounder's distribution, p_U, is identifiable from a finite set of its moments. We first show how, under the assumption that particular features of the outcome and the estimated structural mean can be computed, one can recover the moments of the unobserved confounder. We then discuss how these moments can be used to identify p_U under the assumption that it belongs to a particular parametric family and thereby obtain the interventional distribution via (1).

Specific distributional families require different numbers of moments for identification. For instance, a discrete distribution supported on M points is identified by its first 2M − 1 moments. A normal or Gamma distribution is identified by its first two moments, while a skew-normal requires three moments. We will outline the general procedure and then, in the next section, give an illustration using the skew-normal distribution.

3.1 Recovering moments of the confounder

Consider the following SEM:

\begin{array}{l} U & \sim p_{U}, \\ Z & \sim p_{Z}, \\ T ∣ (U, Z) & = f (U, Z, ε_{T}), \\ Y ∣ (T, U) & \sim p_{Y | T, U}, \end{array}

where p_U is the distribution of the latent confounder, p_Z is the distribution of the instrument, ɛ_T is the treatment noise term, and p_Y|T,U will belong to a parametric family of a type that we will specify next. We maintain the usual IV conditions (A1-4) and in particular p_Y|T,U is such that the additivity of the expectation always holds: $E [Y | T, U] = μ (T) + U$ ⁠. We shall not impose any specific forms of μ nor f. The initial component required is the treatment-response function, μ, which can be recovered up to an additive constant using standard IV techniques. With an estimate $\hat{μ}$ in hand, we can now proceed to isolate the influence of the confounder on the outcome. For that purpose, we will make the following additional assumption on the conditional distribution p_Y|T,U:

A5.
Outcome model. p_Y|T,U belongs to a known parametric family for which there exists a function h_k(y, μ) such that $E [h_{k} (Y, μ (T)) ∣ T, U] = U^{k}, k = 0,1,2, \dots$ ⁠.

Many examples of families satisfying (A5) can be constructed. Among those, some have a trivial location–shift structure that is not interesting for identifying the interventional distribution. For example, let Y = μ(T) + U + ɛ_Y with $ε_{Y} \sim N (0, σ^{2})$ ⁠, i.e. $Y ∣ (T, U) \sim N (μ (T) + U, σ^{2})$ ⁠, assuming σ² is known. In this case, letting He_k denote the probabilists' Hermite polynomials (Chihara, 1978),

h_{k} (y, μ) ≔ σ^{k} {H e}_{k} (\frac{y - μ}{σ})

satisfies (A5). However, since here the treatment T only shifts location and otherwise leaves the outcome density unchanged, if we have a consistent estimator $\hat{μ} (t)$ of the structural mean, we can simply fit any density estimator on the residuals $R_{i} = Y_{i} - \hat{μ} (T_{i})$ ⁠. This recovers the density of U + ɛ_Y; therefore, shifting that estimated density by $\hat{μ} (t)$ recovers the interventional distribution p(Y∣do(T = t)).

For our purposes, (A5) is valuable precisely because it also covers many cases where a residual-based approach cannot be used. We will consider three examples here, which are also summarized in Table 1.

Table 1

Orthogonal polynomials for different conditional distributions of Y|T, U

Model Y\|T, U	h₁(y, μ)	h₂(y, μ)	h₃(y, μ)
$N (μ + U, σ^{2} μ^{2})$	y − μ	(y − μ)² − σ²μ²	(y − μ)³ − 3σ²μ²(y − μ)
Poisson(μ + U)	y − μ	${(y)}_{\underline{2}} - 2 μ y + μ^{2}$	${(y)}_{\underline{3}} - 3 μ {(y)}_{\underline{2}} + 3 μ^{2} y - μ^{3}$
Gamma(α, (μ + U)/α)	y − μ	$\frac{α}{α + 1} y^{2} - 2 μ y + μ^{2}$	$\frac{α^{2}}{(α + 1) (α + 2)} y^{3} - \frac{3 α μ}{α + 1} y^{2} + 3 μ^{2} y - μ^{3}$

Source(s): Table by the author

Gaussian with multiplicative noise. Let Y = μ(T) (1 + ɛ_Y) + U with $ε_{Y} \sim N (0, σ^{2})$ ⁠, i.e. $Y ∣ (T, U) \sim N (μ (T) + U, σ^{2} μ {(T)}^{2})$ ⁠. Then

h_{k} (y, μ) ≔ {(σ | μ |)}^{k} {H e}_{k} (\frac{y - μ}{σ | μ |})

satisfies (A5), but we cannot obtain residuals with a common distribution across treatments, since $\frac{Y - μ (T)}{σ | μ (T) |} = s i g n (μ (T)) \frac{ε_{Y}}{σ} + \frac{U}{σ | μ (T) |}$ still depends on T.

Poisson. Let Y∣T, U ∼ Poisson(μ(T) + U). Taking the Charlier polynomials (Chihara, 1978),

h_{k} (y, μ) ≔ \sum_{m = 0}^{k} (\binom{k}{m}) {(- μ)}^{k - m} {(y)}_{\underline{m}},

with ${(y)}_{\underline{m}} = y (y - 1) \dots (y - m + 1)$ ⁠, satisfies (A5). Since Y is discrete while μ(T) is real-valued, residuals $Y_{i} - \hat{μ} (T_{i})$ are not meaningful targets for a pooled density estimator.

Gamma with fixed shape. Let Y∣T, U ∼ Gamma(α, θ) with known α > 0 and scale θ = (μ(T) + U)/α, so $E [Y ∣ T, U] = μ (T) + U$ and Var(Y∣T, U) = (μ(T) + U)²/α. A choice that satisfies (A5) is based on Laguerre polynomials,

h_{k} (y, μ) ≔ \frac{k!}{{(α)}_{\bar{k}}} {(- 1)}^{k} μ^{k} L_{k}^{(α - 1)} (\frac{α y}{μ}),

where ${(α)}_{\bar{k}} = α (α + 1) \dots (α + k - 1)$ and $L_{k}^{(α - 1)} (x) = \sum_{m = 0}^{k} {(- 1)}^{m} (\binom{k + α - 1}{k - m}) \frac{x^{m}}{m!}$ ⁠. This is another example where the treatment shifts the whole distribution of the outcome, not just its location.

Using the IV exclusion restriction which implies Y ⫫Z∣T, U, we have by law of total expectation

\begin{array}{l} E [h_{k} (Y, μ (T))] & = E_{Z, T} E [h_{k} (Y, μ (T)) | Z, T] \\ = E_{Z, T} E_{U | Z, T} E [h_{k} (Y, μ (T)) | U, Z, T] \\ = E_{Z, T} E_{U | Z, T} E [h_{k} (Y, μ (T)) | U, T] [Y ⫫ Z | T, U] \\ = E_{Z, T} E_{U | Z, T} [U^{k}] [(A 5)] \\ = E [U^{k}] ≕ ξ_{k} . \end{array}

Since the left-hand side is identifiable from data (assuming the recovered μ), so too are the moments of the confounder.

We shall now assume that we can identify p_U from its moments as follows:

A6.
Parametric form of the confounder distribution. p_U belongs to a known parametric family whose parameters can be identified from its first K moments.

Therefore, with the moments {ξ_k} identified, we can recover the parameters of the assumed distributional family for p_U and finally convolve the estimated confounder law with the posited outcome model to obtain an estimate of the interventional distribution. Some examples of possible models that can be posited on U include:

Discrete distribution: If U is assumed to be discrete on M support points {r_j} with weights {π_j}, its distribution is uniquely identified from the first K = 2M − 1 moments. This is a classical problem in numerical analysis, solvable with e.g. Prony's method (Kunis et al., 2016) that recover the support points as roots of a polynomial whose coefficients are derived from the moments.
Normal distribution: If $U \sim N (μ_{U}, σ_{U}^{2})$ ⁠, its parameters are identified from the first two moments: μ_U = ξ₁ and $σ_{U}^{2} = ξ_{2} - ξ_{1}^{2}$ ⁠.
Gamma distribution: If U ∼ Gamma(k, θ), its two parameters are identified from the first two moments via the relations ξ₁ = kθ and $ξ_{2} - ξ_{1}^{2} = k θ^{2}$ ⁠.
Skew-normal Distribution: If U ∼ SkewNormal(ξ, ω, α), its three parameters are uniquely identified from the first three moments by solving the system of equations that link (ξ, ω, α) to (ξ₁, ξ₂, ξ₃).

In Section 4, we will give a numerical illustration using the skew-normal confounder distribution.

3.2 Estimation workflow

We briefly illustrate steps of the overall method.

Recover μ(⋅). Fit IV/NPIV for Y on T to obtain $\hat{μ}$ ⁠.
Compute polynomial features. Form $h_{k} (Y, \hat{μ} (T))$ for the required number of moments k.
Recover the confounder distribution. Average to get confounder moment estimates ${\hat{ξ}}_{k} = n^{- 1} \sum_{i = 1}^{n} h_{k} (Y_{i}, \hat{μ} (T_{i}))$ ⁠. Solve for the parameters of the assumed family for p_U using these estimated moments to obtain an estimate ${\hat{p}}_{U}$ ⁠.
Construct the interventional distribution. The interventional distribution is estimated as the mixture of the posited parametric model p_Y|T,U over the recovered confounder distribution. It is given by the integral

\hat{p} (y ∣ d o (T = t)) = \int p_{Y | T, U} (y | t, u) {\hat{p}}_{U} (u) d u,

This integral simplifies for specific families. For example, in a Gaussian model with multiplicative noise, if we have a discrete ${\hat{p}}_{U}$ ⁠, it becomes a Gaussian mixture. For a skew-normal ${\hat{p}}_{U}$ ⁠, it evaluates to another skew-normal density. When no such simplification is possible, the integral needs to be estimated by Monte Carlo.

3.3 Domain relevance and examples

Distributional effects under IV designs can inform decision-making in development, labor and household/banking finance – especially when policy or market shocks shift tails, quantiles or dispersion. We give some illustrative examples below.

Example 1: Gamma outcome (agricultural yields). Question: How does exogenous variation in input use shift the entire distribution of farm yields, with emphasis on lower tails (i.e. food security risk)? Variables: Instrument Z = rainfall deviations at planting; treatment T = fertilizer/irrigation intensity; outcome Y = yield (continuous, positive). Positive skew and multiplicative agronomic noise make a Gamma mean–variance relation natural for Y.
Example 2: Poisson outcome (household delinquency counts). Question: How does credit supply affect the distribution of delinquency events, focusing on upper-tail risk rather than the mean? Variables: Instrument Z = eligibility thresholds or branch-opening/expansion shocks; treatment T = credit limit/borrowed amount; outcome Y = number of late payments in a fixed horizon. Y is a nonnegative count with event intensity that shifts with T, fitting a Poisson specification.
Example 3: Gaussian with multiplicative noise (household electricity use underpricing). Question: How does time-of-use ToU pricing (or subsidy removal) change the distribution of monthly electricity consumption, including variance and high-usage tails? Variables: Instrument Z = randomized/staggered ToU eligibility or smart-meter rollout; treatment T = exposure to ToU tariff/marginal price; outcome Y = monthly kWh consumption. Weather and activity shocks act proportionally to baseline demand suggesting a multiplicative noise model and aggregation over many appliance-days yields approximately normal distribution.

4. A simple numerical illustration

We now illustrate the framework with a simulation where the endogeneity arises from a continuous, skewed latent confounder. The data-generating process is designed such that both the confounder and the treatment are centered with mean zero.

The latent confounder U is drawn from a skew-normal distribution with shape α = 5.0, scale ω = 8.0 and location ξ = −6.26. The instrument is a standard normal variable, $Z \sim N (0,1)$ ⁠. The treatment T is generated via a model which includes interaction between the confounder and the instrument,

T = a_{1} Z + a_{2} U + a_{3} Z U + a_{4} ε_{T},

where the coefficients are set to (a₁, a₂, a₃, a₄) = (0.6, 0.4, 0.1, 0.4). The error term ɛ_T follows a Student-t distribution with 3 degrees of freedom to introduce heavier tails. The outcome Y is determined by a linear model with multiplicative noise and shifted by the confounder,

Y = c T (1 + ε_{Y}) + U,

with a causal slope of c = 1.2 and Gaussian noise $ε_{Y} \sim N (0, σ^{2})$ ⁠, where σ² = 1. Figure 1 shows a scatterplot of Y against T. As expected in this confounded design, the OLS slope is substantially biased, whereas a standard linear IV estimate correctly recovers the true average treatment effect.

Figure 1

A line and dot chart shows outcome versus treatment with three lines representing O L S, true causal, and I V estimates.

View large Download slide

The line and dot chart titled “Outcome versus Treatment” shows the horizontal axis labeled “Treatment (T)” ranging from negative 10 to 15 and the vertical axis labeled “Outcome (Y)” ranging from negative 25 to 45. The legend in the upper left indicates four elements: “Data points (Y versus T) – subsampled 5000 of 50000” shown as dots, “O L S: Y approx. T (slope equals 3.14)” shown as a dashed line, “True Causal Line (slope c equals 1.2)” shown as a solid line, and “I V Estimated Line (slope c cap equals 1.21)” shown as a thick line. Thousands of circular points are scattered diagonally upward, showing a positive association between Treatment and Outcome, ranging from (negative 3.938, negative 11.804) to (4.747, 16.34). The dashed line labeled “O L S” starts from (negative 8.794, negative 26.649) and ends near (14.487, 46.031), showing the steepest positive slope among all lines. The solid line labeled “True Causal Line” starts from (negative 8.854, negative 10.258) and ends near (14.578, 17.577), showing a moderate positive slope. The thick line labeled “I V Estimated Line” starts from (negative 8.854, negative 10.258) and ends near (14.578, 17.577), showing a moderate positive slope, overlapping with the green line. Note: All numerical data values are approximated.

Outcome vs treatment with OLS, IV and true causal line. Note: The OLS slope is biased in the presence of the latent skew-normal confounder U; the IV estimate aligns with the true average causal slope c = 1.2. Source: Figure by the author

A key feature of this model is the skewness of the interventional distribution. This is particularly the case where the ATE μ(t) may not be sufficient to summarize decision-relevant features—such as tail risks, mass near policy thresholds or shifts in quantiles—so relying on averages alone can mislead downstream causal questions; recovering richer distributional information is therefore essential.

We can also note structural differences between the observational and interventional outcome distributions. Recall that, conditionally on (T = t, U = u), the outcome is Gaussian, $Y \sim N (c t + u, σ^{2} c^{2} t^{2})$ ⁠. The interventional density, p(y∣do(T = t)), is formed by averaging over the marginal distribution of the confounder, p_U. This corresponds to the convolution of a skew-normal density (U) and a Normal density $N (c t, σ^{2} c^{2} t^{2})$ ⁠, which results in another skew-normal density (Azzalini, 2013):

p (y ∣ d o (T = t)) = SkewNormal (y; ξ_{new} (t), ω_{new} (t), α_{new} (t)),

(2)

where

ξ_{new} (t) = ξ + c t, ω_{new} (t) = \sqrt{ω^{2} + σ^{2} c^{2} t^{2}}, α_{new} (t) = \frac{α ω}{\sqrt{ω^{2} + σ^{2} c^{2} t^{2} (1 + α^{2})}} .

(3)

In contrast, the observational density, p(y∣T = t) corresponds to averaging over p(U∣T = t). Due to the non-linear treatment assignment mechanism and the non-Gaussian density of U, p(U∣T = t) has no simple analytical form. Consequently neither does p(y∣T = t). The selection into a specific treatment level T = t profoundly alters the conditional distribution of U away from its marginal, skew-normal form. This is illustrated in Figure 2 which compares the true interventional and observational densities for several treatment levels. In each panel, the interventional density (solid blue) is the true skew-normal interventional law from (2), while the observational density (dashed red) is estimated using a kernel density estimator. The distributions differ significantly in shape, location and skewness, reminding us of the risks of using observational conditionals to draw any causal conclusions. Moreover, interventional densities themselves for different values of treatment t differ not only in location, but also in variance and skewness. Finally, estimated interventional densities (dashed black) correspond to our estimation pipeline, where the method from Section 3 is used to recover the first three moments of U. We then solve for skew-normal parameters $(\hat{ξ}, \hat{ω}, \hat{α})$ using the estimated moments and construct the estimated interventional density.

Figure 2

A set of four graphs shows interventional and conditional outcome densities for varying T values.

View large Download slide

The set of four graphs titled “Interventional versus Conditional Densities (Skew-Normal U)” shows the horizontal axis for all graphs labeled “Outcome (y)” ranging from negative 15 to 20 in increments of 5 units and the vertical axis for all graphs labeled “Density”, ranging from 0.00 to 0.20 in increments of 0.05 units. A legend at the top center identifies five lines as follows: “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis”, “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, and “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”. The top left graph labeled “For uppercase T equals negative 2.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals negative 6.962. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 6.962, 0.143). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.533, 0.079). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 2.44. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.868, 0.076). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The top right graph labeled “For uppercase T equals 0.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals 0.012. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 0.916, 0.219). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 3.53, 0.092). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 0.241. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.373, 0.09). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The bottom left graph labeled “For uppercase T equals 2.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals negative 6.77. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (6.017, 0.114). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (0.239, 0.079). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 2.416. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 0.012, 0.074). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The bottom right graph labeled “For uppercase T equals 5.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals 15.446. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (15.446, 0.06). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.05). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (5, 0.052). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.01). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals 5.831. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (5, 0.052). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.01). Note: All the numerical data points are approximated.

Interventional and observational conditional densities. Note: The interventional density is a skew-normal distribution, while the observational conditional density is estimated via KDE. The two differ substantially due to confounding. Treatments do not only alter location but also shape of interventional densities. Estimated interventional densities use the moment recovery to estimate the parameters of the latent skew-normal confounder. Source: Figure by the author

Figure 2

View large Download slide

The set of four graphs titled “Interventional versus Conditional Densities (Skew-Normal U)” shows the horizontal axis for all graphs labeled “Outcome (y)” ranging from negative 15 to 20 in increments of 5 units and the vertical axis for all graphs labeled “Density”, ranging from 0.00 to 0.20 in increments of 0.05 units. A legend at the top center identifies five lines as follows: “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis”, “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, and “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”. The top left graph labeled “For uppercase T equals negative 2.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals negative 6.962. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 6.962, 0.143). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.533, 0.079). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 2.44. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.868, 0.076). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The top right graph labeled “For uppercase T equals 0.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals 0.012. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 0.916, 0.219). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 3.53, 0.092). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 0.241. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 4.373, 0.09). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The bottom left graph labeled “For uppercase T equals 2.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals negative 6.77. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (6.017, 0.114). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.00). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (0.239, 0.079). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals negative 2.416. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00). The line elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (negative 0.012, 0.074). From the peak, it shows some skewness toward the right to terminate on the horizontal axis at (20, 0.00). The bottom right graph labeled “For uppercase T equals 5.0” shows the line representing “p open parenthesis y modulus uppercase T equals lowercase t close parenthesis open parenthesis K D E close parenthesis” drawn vertically at outcome y equals 15.446. The curve labeled “E square bracket uppercase Y modulus uppercase T equals lowercase t close square bracket”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (15.446, 0.06). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.05). The curve labeled “p open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (5, 0.052). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.01). The line representing “E square bracket uppercase Y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close square bracket”, drawn vertically at outcome y equals 5.831. The curve labeled “p cap open parenthesis y modulus do open parenthesis uppercase T equals lowercase t close parenthesis close parenthesis”, starts from the extreme left of the horizontal axis at (negative 15, 0.00), elevates concave up with an increasing slope, and concave down with a decreasing slope to attain a peak at (5, 0.052). From the peak, it symmetrically comes down to the lower right on the horizontal axis and ends at (20, 0.01). Note: All the numerical data points are approximated.

Interventional and observational conditional densities. Note: The interventional density is a skew-normal distribution, while the observational conditional density is estimated via KDE. The two differ substantially due to confounding. Treatments do not only alter location but also shape of interventional densities. Estimated interventional densities use the moment recovery to estimate the parameters of the latent skew-normal confounder. Source: Figure by the author

5. Conclusions

This paper examined how instrumental variables can be used to recover distributional, rather than only average, causal effects in a structural equation model. In the setting studied here, focusing on the full interventional distribution clarifies tail risk, asymmetry and potential multimodality: when outcome distributions are skewed, reporting only mean effects can be misleading for decision-making. Our procedure uses outcome features based on orthogonal polynomials for moment recovery under an additive parametric outcome model and a confounder distribution identifiable from finitely many moments to reconstruct the interventional law.

The approach relies on standard IV conditions, suitable support and strength of the instrument, but also on the correct specification of the moment-identifiable family for the confounder; under misspecification it will only yield an approximation within the chosen family rather than the true distribution. Overall, our contribution can be viewed as example of a broader class of methods that exploit distributional assumptions, together with instruments, to move from average causal effects toward decision-relevant distributional objects, while acknowledging the limits imposed by modeling choices and data quality.

References

Angrist

,

J.D.

and

Krueger

,

A.B.

(

2001

), “

Instrumental variables and the search for identification: from supply and demand to natural experiments

”,

The Journal of Economic Perspectives

, Vol.

15

No.

4

, pp.

69

-

85

, doi:

https://doi.org/10.1257/jep.15.4.69

.

Google Scholar

Crossref

Azzalini

,

A.

(

2013

),

The Skew-Normal and Related Families

,

Institute of Mathematical Statistics Monographs, Cambridge University Press

,

Cambridge

.

Google Scholar

Crossref

Chihara

,

T.

(

1978

),

An Introduction to Orthogonal Polynomials

,

Gordon and Breach

,

New York

.

Google Scholar

Darolles

,

S.

,

Fan

,

Y.

,

Florens

,

J.P.

and

Renault

,

E.

(

2011

), “

Nonparametric instrumental regression

”,

Econometrica

, Vol.

79

No.

5

, pp.

1541

-

1565

.

Google Scholar

Hartford

,

J.

,

Lewis

,

G.

,

Leyton-Brown

,

K.

and

Taddy

,

M.

(

2017

), “

Deep IV: a flexible approach for counterfactual prediction

”,

Proceedings of the 34th International Conference on Machine Learning

,

PMLR

, pp.

1414

-

1423

.

Google Scholar

Kook

,

L.

and

Pfister

,

N.

(

2024

), “

Instrumental variable estimation of distributional causal effects

”.

Google Scholar

Kunis

,

S.

,

Peter

,

T.

,

Römer

,

T.

and

von der Ohe

,

U.

(

2016

), “

A multivariate generalization of Prony's method

”,

Linear Algebra and Its Applications

, Vol.

490

, pp.

31

-

47

, doi:

https://doi.org/10.1016/j.laa.2015.10.023

.

Google Scholar

Crossref

Muandet

,

K.

,

Fukumizu

,

K.

,

Sriperumbudur

,

B.

and

Schölkopf

,

B.

(

2017

), “

Kernel mean embedding of distributions: a review and beyond

”,

Foundations and Trends® in Machine Learning

, Vol.

10

Nos

1-2

, pp.

1

-

141

, doi:

https://doi.org/10.1561/2200000060

.

Google Scholar

Crossref

Sejdinovic

,

D.

(

2025

), “An overview of causal inference using kernel embeddings”, in

Data Science for Econometrics and Related Topics', Studies in Systems, Decision and Control

,

Springer

.

Google Scholar

Singh

,

R.

,

Sahani

,

M.

and

Gretton

,

A.

(

2019

), “

Kernel instrumental variable regression

”,

Advances in Neural Information Processing Systems

, Vol.

32

, pp.

4593

-

4605

.

Google Scholar

2025

Dino Sejdinovic

Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Instrumental identification of interventional distribution via moment recovery in a parametric outcome model

1. Introduction

2. Background

2.1 Notation and object of interest

2.2 IV assumptions and basic estimands

2.3 Problem statement

2.4 Related work

3. Models and methods

3.1 Recovering moments of the confounder

3.2 Estimation workflow

3.3 Domain relevance and examples

4. A simple numerical illustration

5. Conclusions

References

Email Alerts

Cited By

Instrumental identification of interventional distribution via moment recovery in a parametric outcome model Open Access

1. Introduction

2. Background

2.1 Notation and object of interest

2.2 IV assumptions and basic estimands

2.3 Problem statement

2.4 Related work

3. Models and methods

3.1 Recovering moments of the confounder

3.2 Estimation workflow

3.3 Domain relevance and examples

4. A simple numerical illustration

5. Conclusions

References

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Instrumental identification of interventional distribution via moment recovery in a parametric outcome model