This repository contains the official implementation of the paper Debiased Front-Door Learners for Heterogeneous Effects.
It includes minimal, runnable code to reproduce (i) the synthetic studies and (ii) the FARS case study reported in the paper.
FDCATE.py— Implementation of FD‑DR‑Learner and FD‑R‑Learner and a plug‑in baseline (FD‑PI); contains the synthetic experiment driver.analyze_fars_2000_fd.py— End‑to‑end script that builds the state–year panel for the FARS primary seat‑belt law case study and runs FD‑PI / FD‑DR / FD‑R on it.README.md— This file.data/fars- A folder containing the the FARS data.
Note. The estimators, cross‑fitting scheme, XGBoost settings, and stabilization (denominator flooring at 0.05 for inverse weights/density ratios) match the experimental protocol described in the paper.
We tested with Python 3.9+ on Linux/macOS.
python -V
# Python 3.9.x or newerInstall the minimal dependencies:
python -m pip install -U pip
python -m pip install numpy pandas scikit-learn xgboost statsmodels matplotlib shapInspect the CLI and run with defaults:
python FDCATE.py --help
python FDCATE.pyYou'll see live progress indicators for each simulation block; pass --quiet if you prefer a silent run.
Examples:
# Smaller grid, skip optional sims, and avoid saving artifacts.
python FDCATE.py \
--ns 1000 2500 \
--delta-grid-fixed-n 0 2 \
--rounds 2 \
--no-save --no-plots \
--skip-weak-overlap --skip-mediator-confoundWhat it does:
- Generates data under the conditional FD setup.
- Fits FD‑PI, FD‑DR, and FD‑R with cross‑fitting.
- Evaluates RMSE across:
- sample size sweeps,
- nuisance “noise” at the
$n^{-1/4}$ scale, - weak‑overlap stress tests.
Expected outputs:
- Printed metrics (RMSE ± CI) per regime.
- CSVs/plots saved to the working directory (filenames are self‑explanatory and include the regime label).
Tips for determinism:
PYTHONHASHSEED=0
export PYTHONHASHSEED
# If the scripts expose a --seed flag, set it; otherwise the built‑in defaults are used.Run:
python analyze_fars_2000_fd.pyWhat it does:
- Downloads/loads FARS and NHTSA belt‑use survey tables (public sources) for a balanced state–year panel.
- Constructs variables:
-
Treatment
$X$ : primary law in force (state‑year indicator). -
Mediator
$Z$ : seat‑belt use rate. -
Outcome
$Y$ : occupant fatality rate (per population or per exposure, depending on availability). -
Covariates
$C$ : state & year fixed effects and policy‑relevant factors (weather severity, road‑mix, enforcement, driver status, etc.).
-
Treatment
- Fits FD‑PI, FD‑DR, and FD‑R and produces:
- Distributions of
$\hat\tau(C)$ , - Top-$\alpha$ concentration curves,
- SHAP‑based covariate importance.
- Distributions of
Expected outputs:
- Figures and tables in the working directory (histograms of (\hat\tau), concentration curves, feature importance).
Notes on stability
- Learners use XGBoost (50 trees, depth 3, learning rate 0.1, subsample/colsample 0.9) and cross‑fitting consistent with the paper.
- To stabilize finite‑sample variance, only denominators that appear in inverse weights/density ratios are floored at 0.05; numerators are never clipped.
- Cross-fitting folds/splits (2-way for FD-PI/FD-DR; 3-way for FD-R’s
$b,g,\gamma$ steps). - XGBoost hyperparameters and linear ridge for the final regression(s).
-
$n^{-1/4}$ -scale nuisance perturbations used in stress tests. - Weak-overlap stress by steepening the treatment propensity (no density ratios in FD-R).