Skip to content

Add LiblineaR engine to logistic_reg() #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 26, 2021
Merged

Conversation

juliasilge
Copy link
Member

@juliasilge juliasilge commented Feb 11, 2021

This PR adds the LiblineaR engine to logistic_reg().

We are currently having some real uncertainty about what is going on with the cost argument to LiblineaR::LiblineaR(). The docs say:

cost of constraints violation (default: 1). Rules the trade-off between regularization and correct classification on data. It can be seen as the inverse of a regularization constant.

However, for both lasso and ridge regression, treating 1 / cost like a regularization penalty gives very different results than glmnet. It seems like LIBLINEAR is using a different optimizer or maybe solving a different thing altogether??? 😮

Here is ridge (similar for lasso):

library(tidyverse)
library(parsnip)
data(two_class_dat, package = "modeldata")
data_grid <- crossing(A = seq(0.4, 4, length = 200), B = seq(.14, 3.9, length = 200))

liblinear_pred <- 
  logistic_reg(penalty = 0.01, mixture = 0) %>%
  set_engine("LiblineaR") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "LiblineaR")

glmnet_pred <- 
  logistic_reg(penalty = 0.01, mixture = 0) %>%
  set_engine("glmnet") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "glmnet")

glm_pred <- 
  logistic_reg() %>%
  set_engine("glm") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "glm")


bind_rows(liblinear_pred, glmnet_pred, glm_pred) %>%
  ggplot(aes(x = A, y = B)) + 
  geom_point(data = two_class_dat, aes(col = Class), alpha = .5, show.legend = FALSE) + 
  geom_contour(aes( z = .pred_Class1, lty = engine), breaks = 0.5, col = "black") + 
  coord_equal() + 
  theme_minimal()

Created on 2021-02-11 by the reprex package (v1.0.0)

You have to really bump up the regularization to get LiblineaR to do anything different than glm().

@juliasilge
Copy link
Member Author

Closes #419

@juliasilge
Copy link
Member Author

juliasilge commented Feb 16, 2021

We did some digging and ended up finding that

We believe these are the sources of any differences between glmnet and LiblineaR.

library(tidyverse)
library(parsnip)
data(two_class_dat, package = "modeldata")
data_grid <- crossing(A = seq(0.4, 4, length = 200), B = seq(.14, 3.9, length = 200))

liblinear_pred <- 
  logistic_reg(penalty = 0.01, mixture = 1) %>%
  set_engine("LiblineaR") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "LiblineaR")

glmnet_pred <- 
  logistic_reg(penalty = 0.01, mixture = 1) %>%
  set_engine("glmnet") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "glmnet")

glm_pred <- 
  logistic_reg() %>%
  set_engine("glm") %>%
  set_mode("classification") %>%
  fit(Class ~ ., two_class_dat) %>% 
  predict(data_grid, type = "prob") %>% 
  bind_cols(data_grid) %>% 
  mutate(engine = "glm")


bind_rows(liblinear_pred, glmnet_pred, glm_pred) %>%
  ggplot(aes(x = A, y = B)) + 
  geom_point(data = two_class_dat, aes(col = Class), alpha = .5, show.legend = FALSE) + 
  geom_contour(aes( z = .pred_Class1, lty = engine), breaks = 0.5, col = "black") + 
  coord_equal() + 
  theme_minimal()

Created on 2021-02-16 by the reprex package (v1.0.0)

@juliasilge juliasilge marked this pull request as ready for review February 16, 2021 16:49
@juliasilge juliasilge requested a review from topepo February 16, 2021 16:50
@juliasilge juliasilge merged commit 154c1ab into master Feb 26, 2021
@juliasilge juliasilge deleted the logistic-liblinear branch February 26, 2021 19:14
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants