-
Notifications
You must be signed in to change notification settings - Fork 770
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Hi all,
I have enjoyed using the python package, but also needed the R package. I noticed that predicted probabilities from the EBM model created with R are too high when the dataset is imbalanced. I wonder if an intercept term may be missing from the model and/or from the predict_proba function? It looks like the ebm_predict_proba function simply adds together the contributions from each feature function, and applies the sigmoid function, but no intercept is added before applying the sigmoid.
Example: a dummy imbalanced dataset with 10% of positive cases. Average predicted probability is 0.39, which is much higher than 0.1.
df <- data.frame(x=seq(1, 100), y=c(rep(0, 90), rep(1, 10)))
clf <- ebm_classify(df['x'], df$y)
prob <- ebm_predict_proba(clf, df['x'])
print(mean(prob))
print(mean(df$y))
> print(mean(prob))
[1] 0.3949367
> print(mean(df$y))
[1] 0.1
Many thanks,
Andres
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working