R: intercept may not be handled correctly

Hi all, 

I have enjoyed using the python package, but also needed the R package. I noticed that predicted probabilities from the EBM model created with R are too high when the dataset is imbalanced. I wonder if an intercept term may be missing from the model and/or from the predict_proba function? It looks like the ebm_predict_proba function simply adds together the contributions from each feature function, and applies the sigmoid function, but no intercept is added before applying the sigmoid.

Example: a dummy imbalanced dataset with 10% of positive cases. Average predicted probability is 0.39, which is much higher than 0.1.

```
df <- data.frame(x=seq(1, 100), y=c(rep(0, 90), rep(1, 10)))
clf <- ebm_classify(df['x'], df$y)
prob <- ebm_predict_proba(clf, df['x'])
print(mean(prob))
print(mean(df$y))

> print(mean(prob))
[1] 0.3949367
> print(mean(df$y))
[1] 0.1
```

Many thanks,
Andres



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

R: intercept may not be handled correctly #417

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

R: intercept may not be handled correctly #417

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions