Skip to content

Get correct coefs for ridge regression #486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2021
Merged

Get correct coefs for ridge regression #486

merged 9 commits into from
May 12, 2021

Conversation

topepo
Copy link
Member

@topepo topepo commented May 11, 2021

closes #431

Makes a special parsnip parameter that allows users to set the penalty value independent of the full regularization path. This can help with pure ridge models where glmnet may not produce the correct values.

@topepo topepo requested a review from juliasilge May 11, 2021 18:46
Copy link
Member

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we not protecting against folks still doing the wrong thing in the penalty = 0 case? They will need to see this in the documentation to know what to do?

@topepo
Copy link
Member Author

topepo commented May 12, 2021

So are we not protecting against folks still doing the wrong thing in the penalty = 0 case? They will need to see this in the documentation to know what to do?

They will have to see the documentation to understand that.

We could fix the path values to a vector that would probably capture 90% of the cases (and document that). We can't effectively check the penalty value before setting a default; at parsnip predict-time, we don't know the range of penalties that will be used at predict-time.

The trade-off is:

  • 👍 People won't silently get the wrong answers for ridge regression models.
  • 👎 Model results might be slightly different than if we didn't set lambda; the default might cause poor results for penalties higher than our default.

Either is fine with me.

@juliasilge
Copy link
Member

OK, I was fuzzy on our options before but I think I am clearer now.

I think we both agree that it is much more common for folks to want to use ridge regression than to end up needing very high penalty values. On the other hand, getting slightly different results than the underlying model is something that we know confuses people and that will happen basically for all cases of using glmnet ever if we set a path of lambdas.

This makes me think we shouldn't set a default. I am definitely a little worried that this is like a GOTCHA that people are not going to see; maybe in #456 we can think about how to highlight this kind of issue on, say, the glmnet landing page.

@topepo topepo merged commit fc21c9e into master May 12, 2021
@topepo topepo deleted the path-values branch May 12, 2021 16:09
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add error for ridge regression with glmnet
2 participants