🛸 The Directed Prediction Index (DPI).
The Directed Prediction Index (DPI) is a quasi-causal inference (causal discovery) method for observational data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models.
Bruce H. W. S. Bao 包寒吴霜
- Bao, H. W. S. (2025). DPI: The Directed Prediction Index for causal direction inference from observational data. https://doi.org/10.32614/CRAN.package.DPI
- Bao, H. W. S. (in preparation). The Directed Prediction Index (DPI): Quantifying relative endogeneity for causal direction inference from observational data. (Manuscript in preparation)
## Method 1: Install from CRAN
install.packages("DPI")
## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/DPI", force=TRUE)Define
In econometrics and broader social sciences, an exogenous variable is assumed to have a directed (causal or quasi-causal) influence on an endogenous variable (
All steps have been compiled into DPI() and DPI_curve(). See their help pages for usage and illustrative examples. Below are conceptual rationales and mathematical explanations.
Define
The k.cov in the DPI() function). A higher
Notably, as an expected attribute in causal inference, the
Define
The
To control for false positive rates, users can set a lower alpha in DPI() and related functions) and/or use Bonferroni correction for multiple pairwise tests (see bonf in DPI() and related functions).
Notes on transformation among
Wagenmakers (2022) also proposed a simple and useful algorithm to compute approximate (pseudo) Bayes Factors from p values and sample sizes (see transformation rules below).
Below we show that normalized penalty scores
+------------------+-----------------------------+-----------------------------+------------------------------+-------------------------------+
|
(1) Main analysis using DPI(): Simulate n.sim random samples, with k.cov (unobservable) random covariate(s) in each simulated sample, to test the statistical significance of DPI.
(2) Robustness check using DPI_curve(): Run a series of DPI simulation analyses respectively with 1~k.covs (usually 1~10) random covariates, producing a curve of DPIs (estimates and 95% CI; usually getting closer to 0 as k.covs increases) that can indicate its sensitivity in identifying the directed prediction (i.e., How many random covariates can DPIs survive to remain significant?).
(3) Causal discovery using DPI_dag(): Directed acyclic graphs (DAGs) via the DPI exploratory analysis for all significant partial correlations.
This package also includes other functions helpful for exploring variable relationships and performing simulation studies.
-
Network analysis functions
-
cor_net(): Correlation and partial correlation networks. -
BNs_dag(): Directed acyclic graphs (DAGs) via Bayesian networks (BNs).
-
-
Data simulation functions
-
sim_data(): Simulate data from a multivariate normal distribution. -
sim_data_exp(): Simulate experiment-like data with independent binary Xs.
-
-
Miscellaneous functions
-
cor_matrix(): Produce a symmetric correlation matrix from values. -
p_to_bf(): Convert p values to pseudo Bayes Factors ($\text{PseudoBF}_{10}$ ).
-

