Skip to content

R: intercept may not be handled correctly #417

@tammandres

Description

@tammandres

Hi all,

I have enjoyed using the python package, but also needed the R package. I noticed that predicted probabilities from the EBM model created with R are too high when the dataset is imbalanced. I wonder if an intercept term may be missing from the model and/or from the predict_proba function? It looks like the ebm_predict_proba function simply adds together the contributions from each feature function, and applies the sigmoid function, but no intercept is added before applying the sigmoid.

Example: a dummy imbalanced dataset with 10% of positive cases. Average predicted probability is 0.39, which is much higher than 0.1.

df <- data.frame(x=seq(1, 100), y=c(rep(0, 90), rep(1, 10)))
clf <- ebm_classify(df['x'], df$y)
prob <- ebm_predict_proba(clf, df['x'])
print(mean(prob))
print(mean(df$y))

> print(mean(prob))
[1] 0.3949367
> print(mean(df$y))
[1] 0.1

Many thanks,
Andres

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions