Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review: Functionality for predicting densities #9

Open
salbalkus opened this issue Oct 4, 2024 · 0 comments
Open

JOSS review: Functionality for predicting densities #9

salbalkus opened this issue Oct 4, 2024 · 0 comments

Comments

@salbalkus
Copy link

Raising as part of JOSS review openjournals/joss-reviews#7241

During my review, I tested the package's functionality on a few different simulated conditional densities (gamma/beta, etc.) with different estimator hyperparameter settings. All outputs appeared reasonable -- great job implementing a relatively complex statistical procedure in R. However, I do have one somewhat major criticism regarding the functionality of the software: it is relatively difficult to obtain conditional density predictions for each unit of a given dataset.

To elaborate, based on the code examples, it seems as though the API is oriented towards visualization of an entire conditional density for a fixed $x$ value. For instance, consider the following simulation setup:

library("lpcde")
set.seed(42)
n=100
x_data = matrix(rbeta(n, 2, 4))
y_data = matrix(rgamma(n, 10, 1/x_data))

It is very easy to fit the model across a small grid of $y$ values with just two lines of code:

y_grid = seq(from=1, to=5, length.out=10)
model1 = lpcde::lpcde(x_data=x_data, y_data=y_data, y_grid=y_grid, x= 0.5, bw = 0.5)

The grid-based approach is nice for visualization if estimating the entire density curve is the end goal. However, many practical applications necessitate estimating conditional density values for each unit of a dataset to perform some downstream analysis. For example, many counterfactual mean estimators for a continuous treatment in causal inference require first estimating the conditional density of the treatment given covariates for each unit, and then reweighting an outcome based on the conditional density (for example, Diaz and van der Laan (2012), Haneuse and Rotnizky (2013) or even something like Schindl, Shen, and Kennedy (2024)).

Based on the API, one would think this could be accomplished by replacing y_grid with y_data and x with x_data. However, doing so runs into multiple issues. First, x does not seem to be able to take in a vector input -- I get this result:

> model1 = lpcde::lpcde(x_data=x_data, y_data=y_data, y_grid=y_grid, x=matrix(c(0.3, 0.6), ncol = 2), bw = 0.5)
> Error in s_mat %*% e_vec : non-conformable arguments
In addition: Warning messages:
1: In sweep(x_data, 2, x) :
  STATS is longer than the extent of 'dim(x)[MARGIN]'
2: In sweep(x_sorted, 2, x) :
  STATS is longer than the extent of 'dim(x)[MARGIN]'

Maybe I am providing the input incorrectly, but either way, it is not obvious how to compute estimates for multiple x values. Second, consider the following line, which takes about 30 seconds to run on my machine:

model1 = lpcde::lpcde(x_data=x_data, y_data=y_data, y_grid=y_data, x= 0.5, bw = 0.5)

It yields the exact same predictions as the following, which runs in under 2 seconds:

preds = vector(length = 100)
for(i in 1:100){
model2 = lpcde::lpcde(x_data=x_data, y_data=y_data, y_grid=y_data[i], x=0.5, bw = 0.5)
preds[i] = model2$Estimate[3] # or extract whatever statistics you like...
}

which leads me to believe something redundant is going on under the hood that I don't quite understand. Third, there seems to be no way to, after the model is fit, apply it to obtain estimates on a new set of data. In any case, I would recommend implementing something like a predict method to perform inference on an arbitrary dataset, in order to separate these two uses cases and allow obtaining estimates on a new set of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant