Skip to content

Get correct coefs for ridge regression #486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,33 @@
# parsnip (development version)

* `generics::required_pkgs()` was extended for `parsnip` objects.

* The `liquidSVM` engine for `svm_rbf()` was deprecated due to that package's removal from CRAN. (#425)
## Model Specification Changes

* A new linear SVM model `svm_linear()` is now available with the `LiblineaR` engine (#424) and the `kernlab` engine (#438), and the `LiblineaR` engine is available for `logistic_reg()` as well (#429). These models can use sparse matrices via `fit_xy()` (#447) and have a `tidy` method (#474).

* For models with `glmnet` engines:

- A single value is required for `penalty` (either a single numeric value or a value of `tune()`) (#481).
- A special argument called `path_values` can be used to set the `lambda` path as a specific set of numbers (independent of the value of `penalty`). A pure ridge regression models (i.e., `mixture = 1`) will generate incorrect values if the path does not include zero. See issue #431 for discussion (#486).

* The `liquidSVM` engine for `svm_rbf()` was deprecated due to that package's removal from CRAN. (#425)

* New model specification `survival_reg()` for the new mode `"censored regression"` (#444). `surv_reg()` is now soft-deprecated (#448).

* New model specification `proportional_hazards()` for the `"censored regression"` mode (#451).

## Other Changes

* Re-licensed package from GPL-2 to MIT. See [consent from copyright holders here](https://github.com/tidymodels/parsnip/issues/462).

* `set_mode()` now checks if `mode` is compatible with the model class, similar to `new_model_spec()` (@jtlandis, #467).

* Re-organized model documentation for `update` methods (#479).



* `generics::required_pkgs()` was extended for `parsnip` objects.



# parsnip 0.1.5

Expand Down
17 changes: 13 additions & 4 deletions R/linear_reg.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,23 @@ translate.linear_reg <- function(x, engine = x$engine, ...) {
x <- translate.default(x, engine, ...)

if (engine == "glmnet") {
# See discussion in https://github.com/tidymodels/parsnip/issues/195
x$method$fit$args$lambda <- NULL
check_glmnet_penalty(x)
if (any(names(x$eng_args) == "path_values")) {
# Since we decouple the parsnip `penalty` argument from being the same
# as the glmnet `lambda` value, `path_values` allows users to set the
# path differently from the default that glmnet uses. See
# https://github.com/tidymodels/parsnip/issues/431
x$method$fit$args$lambda <- x$eng_args$path_values
x$eng_args$path_values <- NULL
x$method$fit$args$path_values <- NULL
} else {
# See discussion in https://github.com/tidymodels/parsnip/issues/195
x$method$fit$args$lambda <- NULL
}
# Since the `fit` information is gone for the penalty, we need to have an
# evaluated value for the parameter.
x$args$penalty <- rlang::eval_tidy(x$args$penalty)
check_glmnet_penalty(x)
}

x
}

Expand Down
22 changes: 14 additions & 8 deletions R/logistic_reg.R
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,23 @@ translate.logistic_reg <- function(x, engine = x$engine, ...) {
arg_vals <- x$method$fit$args
arg_names <- names(arg_vals)


if (engine == "glmnet") {
# See discussion in https://github.com/tidymodels/parsnip/issues/195
arg_vals$lambda <- NULL
check_glmnet_penalty(x)
if (any(names(x$eng_args) == "path_values")) {
# Since we decouple the parsnip `penalty` argument from being the same
# as the glmnet `lambda` value, `path_values` allows users to set the
# path differently from the default that glmnet uses. See
# https://github.com/tidymodels/parsnip/issues/431
x$method$fit$args$lambda <- x$eng_args$path_values
x$eng_args$path_values <- NULL
x$method$fit$args$path_values <- NULL
} else {
# See discussion in https://github.com/tidymodels/parsnip/issues/195
x$method$fit$args$lambda <- NULL
}
# Since the `fit` information is gone for the penalty, we need to have an
# evaluated value for the parameter.
x$args$penalty <- rlang::eval_tidy(x$args$penalty)
check_glmnet_penalty(x)
}

if (engine == "LiblineaR") {
Expand All @@ -134,11 +143,8 @@ translate.logistic_reg <- function(x, engine = x$engine, ...) {
rlang::abort("For the LiblineaR engine, mixture must be 0 or 1.")
}
}

x$method$fit$args <- arg_vals
}

x$method$fit$args <- arg_vals

x
}

Expand Down
6 changes: 4 additions & 2 deletions R/misc.R
Original file line number Diff line number Diff line change
Expand Up @@ -324,10 +324,12 @@ stan_conf_int <- function(object, newdata) {
}

check_glmnet_penalty <- function(x) {
if (length(x$args$penalty) != 1) {
pen <- rlang::eval_tidy(x$args$penalty)

if (length(pen) != 1) {
rlang::abort(c(
"For the glmnet engine, `penalty` must be a single number (or a value of `tune()`).",
glue::glue("There are {length(x$args$penalty)} values for `penalty`."),
glue::glue("There are {length(pen)} values for `penalty`."),
"To try multiple values for total regularization, use the tune package.",
"To predict multiple penalties, use `multi_predict()`"
))
Expand Down
41 changes: 32 additions & 9 deletions man/linear_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 32 additions & 9 deletions man/logistic_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 32 additions & 9 deletions man/multinom_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 23 additions & 8 deletions man/rmd/linear-reg.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,29 @@ linear_reg(penalty = 0.1) %>%
translate()
```

For `glmnet` models, the full regularization path is always fit regardless of the
value given to `penalty`. Also, there is the option to pass multiple values (or
no values) to the `penalty` argument. When using the `predict()` method in these
cases, the return value depends on the value of `penalty`. When using
`predict()`, only a single value of the penalty can be used. When predicting on
multiple penalties, the `multi_predict()` function can be used. It returns a
tibble with a list column called `.pred` that contains a tibble with all of the
penalty results.
The glmnet engine requires a single value for the `penalty` argument (a number
or `tune()`), but the full regularization path is always fit
regardless of the value given to `penalty`. To pass in a custom sequence of
values for glmnet's `lambda`, use the argument `path_values` in `set_engine()`.
This will assign the value of the glmnet `lambda` parameter without disturbing
the value given of `linear_reg(penalty)`. For example:

```{r glmnet-path}
linear_reg(penalty = .1) %>%
set_engine("glmnet", path_values = c(0, 10^seq(-10, 1, length.out = 20))) %>%
translate()
```

When fitting a pure ridge regression model (i.e., `penalty = 0`), we _strongly
suggest_ that you pass in a vector for `path_values` that includes zero. See
[issue #431](https://github.com/tidymodels/parsnip/issues/431) for a discussion.

When using `predict()`, the single `penalty` value used for prediction is the
one specified in `linear_reg()`.

To predict on multiple penalties, use the `multi_predict()` function.
This function returns a tibble with a list column called `.pred` containing
all of the penalty results.

## stan

Expand Down
32 changes: 24 additions & 8 deletions man/rmd/logistic-reg.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,30 @@ logistic_reg(penalty = 0.1) %>%
translate()
```

For `glmnet` models, the full regularization path is always fit regardless of the
value given to `penalty`. Also, there is the option to pass multiple values (or
no values) to the `penalty` argument. When using the `predict()` method in these
cases, the return value depends on the value of `penalty`. When using
`predict()`, only a single value of the penalty can be used. When predicting on
multiple penalties, the `multi_predict()` function can be used. It returns a
tibble with a list column called `.pred` that contains a tibble with all of the
penalty results.
The glmnet engine requires a single value for the `penalty` argument (a number
or `tune()`), but the full regularization path is always fit
regardless of the value given to `penalty`. To pass in a custom sequence of
values for glmnet's `lambda`, use the argument `path_values` in `set_engine()`.
This will assign the value of the glmnet `lambda` parameter without disturbing
the value given of `logistic_reg(penalty)`. For example:

```{r glmnet-path}
logistic_reg(penalty = .1) %>%
set_engine("glmnet", path_values = c(0, 10^seq(-10, 1, length.out = 20))) %>%
translate()
```

When fitting a pure ridge regression model (i.e., `penalty = 0`), we _strongly
suggest_ that you pass in a vector for `path_values` that includes zero. See
[issue #431](https://github.com/tidymodels/parsnip/issues/431) for a discussion.

When using `predict()`, the single `penalty` value used for prediction is the
one specified in `logistic_reg()`.

To predict on multiple penalties, use the `multi_predict()` function.
This function returns a tibble with a list column called `.pred` containing
all of the penalty results.


## LiblineaR

Expand Down
33 changes: 25 additions & 8 deletions man/rmd/multinom-reg.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,31 @@ multinom_reg(penalty = 0.1) %>%
translate()
```

For `glmnet` models, the full regularization path is always fit regardless of the
value given to `penalty`. Also, there is the option to pass multiple values (or
no values) to the `penalty` argument. When using the `predict()` method in these
cases, the return value depends on the value of `penalty`. When using
`predict()`, only a single value of the penalty can be used. When predicting on
multiple penalties, the `multi_predict()` function can be used. It returns a
tibble with a list column called `.pred` that contains a tibble with all of the
penalty results.
The glmnet engine requires a single value for the `penalty` argument (a number
or `tune()`), but the full regularization path is always fit
regardless of the value given to `penalty`. To pass in a custom sequence of
values for glmnet's `lambda`, use the argument `path_values` in `set_engine()`.
This will assign the value of the glmnet `lambda` parameter without disturbing
the value given of `multinom_reg(penalty)`. For example:


```{r glmnet-path}
multinom_reg(penalty = .1) %>%
set_engine("glmnet", path_values = c(0, 10^seq(-10, 1, length.out = 20))) %>%
translate()
```

When fitting a pure ridge regression model (i.e., `penalty = 0`), we _strongly
suggest_ that you pass in a vector for `path_values` that includes zero. See
[issue #431](https://github.com/tidymodels/parsnip/issues/431) for a discussion.

When using `predict()`, the single `penalty` value used for prediction is the
one specified in `multinom_reg()`.

To predict on multiple penalties, use the `multi_predict()` function.
This function returns a tibble with a list column called `.pred` containing
all of the penalty results.


## nnet

Expand Down
Loading