Entropy Balancing
Entropy Balancing is a statistical method implemented as both an R package and a Stata routine, designed for reweighting data to achieve covariate balance in observational studies.
The method is based on the approaches developed in Hainmueller (2012) and Hainmueller and Xu (2013), and it won the Warren Miller Award from the Society of Political Methodology in 2020.
→ Read the explainer — a self-contained tutorial on entropy balancing for R and Stata users.
Source on GitHub: j-hai/ebal (R package) · j-hai/ebal-stata (Stata routine).
The four-line workflow
fit <- ebalance(treat ~ x1 + x2 + x3, data = df) # 1. fit weights
balance_table(fit) # 2. check balance
df$w <- weights(fit) # 3. attach weights
lm(y ~ treat, data = df, weights = w) # 4. estimate effect
That’s the full promise of the package: balance the covariates, get weights, run your regression. Everything else on this page is a refinement of those four lines.
Which estimand?
estimand | who gets reweighted | answers |
|---|---|---|
"ATT" (default) | controls | “what was the effect on those who got treatment?” |
"ATC" | treated | “what would the effect have been on the controls?” |
"ATE" | both | “what is the average effect across the population?” |
Always read weights via weights(fit). It returns a length-n vector aligned to the original Treatment/X and routes the per-side semantics correctly.
Worked example: Lalonde NSW vs. PSID controls
The 1986 Lalonde benchmark — NSW job-training trial controls replaced by 429 PSID respondents — is the textbook stress test for covariate-adjustment methods. The naive comparison is badly biased; ebalance recovers an estimate close to the experimental benchmark of +$1,794.
library(ebal); library(generics)
data(lalonde, package = "cobalt")
# 1. Fit
fit <- ebalance(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde)
# 2. Check balance
balance_table(fit)[, c("variable", "std_diff_pre", "std_diff_post")]
#> variable std_diff_pre std_diff_post
#> 1 age -0.242 0
#> 2 educ 0.045 0
#> 3 racehispan -0.277 0
#> 4 racewhite -1.406 0
#> 5 married -0.719 0
#> 6 nodegree 0.235 0
#> 7 re74 -0.596 0
#> 8 re75 -0.297 0
# 3. Attach weights (length = nrow(lalonde); treated = 1, controls reweighted)
lalonde$w <- weights(fit)
# 4. Estimate the ATT
coef(lm(re78 ~ treat, data = lalonde, weights = w))[2]
#> +1273 (vs. naive -635, vs. experimental benchmark +1794)
autoplot(fit). Robust standard errors
lm()’s default standard errors don’t account for the weighting. Use sandwich::vcovHC() (or vcovCL() if you have a clustering variable):
library(sandwich); library(lmtest)
mod <- lm(re78 ~ treat, data = lalonde, weights = w)
coeftest(mod, vcov = vcovHC(mod, type = "HC1"))
Is the fit healthy? diagnostics() and glance()
diagnostics(fit)
#> ebalance diagnostics (estimand: ATT)
#> --------------------------------------
#> control PASS effective sample size = 98 of 429, max/mean = 3.6
#> treated PASS effective sample size = 185 of 185, max/mean = 1.00
#> balance PASS max |std diff post| = 0.0000
#> converged PASS max moment deviation = 0.41
generics::glance(fit) is the same numbers in a one-row data frame — convenient for stitching across many fits. The headline is ESS = 98 of 429: the fit is concentrated on roughly a quarter of the donor pool, which is what you’d expect when the PSID-vs-NSW gap is this large.
If diagnostics() flags low ESS or a high max/mean ratio, drill into the weight distribution itself:
plot(fit, type = "weights")
weight = 1 is the uniform-weighting baseline; mass to the right of it is over-represented PSID controls compensating for the PSID/NSW covariate gap, mass to the left is under-represented. This plot is a *check*, not a primary result — use it when ESS is low or the max-weight ratio looks alarming. Comparing estimands: ATT vs ATE vs ATC
fit_att <- ebalance(treat ~ ..., data = lalonde, estimand = "ATT")
fit_ate <- ebalance(treat ~ ..., data = lalonde, estimand = "ATE")
fit_atc <- ebalance(treat ~ ..., data = lalonde, estimand = "ATC")
# weights() does the right thing for each estimand
lalonde$w <- weights(fit_ate)
coef(lm(re78 ~ treat, data = lalonde, weights = w))[2]
Combining ebal with difference-in-differences
A common applied pattern is to use ebal as the first stage of a DID design: ebal-weighted DID handles unobserved time-invariant confounders (via the difference) and observed covariate imbalance (via the weights) simultaneously. The Lalonde data has earnings in 1974, 1975, and 1978 — so we can check parallel trends with a 1975 placebo.
We deliberately balance on demographics only (age, education, race, marital status, no-degree status) and leave prior earnings out of the constraints. The point is to see whether DID + ebal can absorb the time-invariant earnings level difference between NSW and PSID without having seen those earnings during balancing.
# 1) Balance on demographics only
fit <- ebalance(treat ~ age + educ + race + married + nodegree, data = lalonde)
lalonde$w <- weights(fit)
# 2) DID using 1974 as the pre-period
did <- function(post, pre = "re74") {
d_t <- mean(lalonde[lalonde$treat == 1, post]) -
mean(lalonde[lalonde$treat == 1, pre])
d_c <- weighted.mean(lalonde[lalonde$treat == 0, post],
w = lalonde$w[lalonde$treat == 0]) -
weighted.mean(lalonde[lalonde$treat == 0, pre],
w = lalonde$w[lalonde$treat == 0])
d_t - d_c
}
did("re75") # 1975 placebo (training was 1976-77, so this should be ~0)
#> +1145
did("re78") # 1978 effect
#> +2181 (experimental benchmark = +1794)
# Equivalent regression form, drop-in for clustered SEs / fixed effects:
# library(fixest); feols(re78 - re74 ~ treat, data = lalonde, weights = ~w)
The DID + ebal estimate +2181 (95% bootstrap CI [+414, +3857]) brackets the experimental benchmark of +1794, even though we never told ebalance() about prior earnings. The 1975 placebo (+1145) is closer to zero than the unweighted 1975 placebo (+2589) but not zero, which is honest about demographics-only balancing — a user iterating on this design would naturally add re74 to the balance constraints to flatten the placebo further.
What’s new in ebal 0.3-0 (May 2026)
- ATT / ATE / ATC estimands via the new
estimandargument onebalance().weights(fit)returns the right length-nvector for each. -
balance_table(fit)— exported, with explicitmean_treated_pre/post,mean_control_pre/post,diff_pre/post,std_diff_pre/post,pct_reductioncolumns. The same numbers feedsummary(),tidy(),plot(), andautoplot(). -
diagnostics(fit)— friendly “is my fit okay?” report with PASS / WARN / FAIL flags for ESS, balance, convergence, and trim feasibility. - Weak-fit warnings at fit time when ESS is below 30% of side n, max/mean weight ratio is above 10, or the solver didn’t converge. Suppressible via
options(ebal.warn_weak_fit = FALSE). - Two new vignettes:
vignette("estimands")andvignette("outcome-models"). - Autodiff solver (advanced, opt-in):
method = "autodiff"runs BFGS ontorch-computed gradients instead of Newton-Raphson. More stable on poorly conditioned dual losses; contributed by Apoorva Lal, ported with attribution from his fork at github.com/apoorvalal/ebal. Apoorva is now listed asauton the package.
The previous release (0.2.1, April 2026) added the formula interface, print() / summary() / plot() / weights() S3 methods, and numerical hardening for ebalance.trim().
The Stata routine was also updated in April 2026 to version 1.5.5 with bug fixes, a new quietly option, a replace option for gen(), and a cap on the linear predictor before exp() to prevent Inf → NaN propagation. No numerical changes; verified byte-for-byte against the 1.5.3 baseline. Source on GitHub.
Entropy Balancing for R — also on GitHub
Entropy Balancing for Stata — also on GitHub
References
Journal Articles
- Political AnalysisEntropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studiesPolitical Analysis, 2012