Skip to content

Add defaults to engine specific params in model docs #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 7, 2020

Conversation

juliasilge
Copy link
Member

@juliasilge juliasilge commented May 27, 2020

This PR closes #211.

What I have so far here is my initial idea of how to add the default values for each engine specific parameter into the tables in the docs. These values are only needed for devtools::document() so I am thinking let's make little internal functions via paste0() rather than get_model_env() and add all these values via some set_... function and keep them around.

What are your thoughts?

Current results for the one model implemented right now:

library(parsnip)
convert_args("decision_tree")
#> Joining, by = c("model", "engine", "parsnip", "original")
parsnip rpart C5.0 spark
tree_depth maxdepth (30) NA max_depth (5)
min_n minsplit (20) minCases (2) min_instances_per_node (1)
cost_complexity cp (0.01) NA NA

Created on 2020-05-27 by the reprex package (v0.3.0)

@apreshill
Copy link

@juliasilge I believe this also would close #235? 🎉

@juliasilge
Copy link
Member Author

For now I'm only adding these to the tables in the docs, with functions that are only run during devtools::document(). So no user-facing functions quite yet.

Users could look at the docs and get the values but we haven't moved quite all the way to a dynamic lookup.

@juliasilge
Copy link
Member Author

We're going to have to make a call on what to do with the rand_forest() table, for mtry and min_n.

For ranger's mtry:

Number of variables to possibly split at in each node. Default is the (rounded down) square root
of the number variables. Alternatively, a single argument function returning an integer, given the 
number of independent variables.

For ranger's min.node.size:

Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for 
probability.

Neither of these are available via formals().

For randomForest, the function argument themselves are defined as:

mtry=if (!is.null(y) && !is.factor(y))
             max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x)))

nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1

I think I'm going to put some of these in manually.

@topepo
Copy link
Member

topepo commented May 28, 2020

Yes. I would add them as expressions to your data structure. Maybe something like "1C|5R|3S" with a legend on the bottom?

@juliasilge
Copy link
Member Author

The issues around rand_forest() and SVM make me think #235 is going to be quite difficult.

@juliasilge juliasilge marked this pull request as ready for review May 28, 2020 23:24
@juliasilge juliasilge requested a review from topepo May 28, 2020 23:25
@juliasilge
Copy link
Member Author

The SVM options are less than ideal. They are hard-coded mostly, and I dug around in the liquidSVM code and came up with... this:

> convert_args("svm_rbf")
parsnip kernlab liquidSVM
cost C (1) lambdas (varies)
rbf_sigma sigma (1) gammas (varies)
margin epsilon (0.1) NA

Hmmm

@juliasilge
Copy link
Member Author

I updated the sigma value for kernlab because although rbfdot() has a default of 1, during training the value used depends on the data.

@topepo topepo merged commit c7e640a into master Jun 7, 2020
@juliasilge juliasilge deleted the engine-specific_defaults branch June 30, 2020 20:13
@github-actions
Copy link

github-actions bot commented Mar 7, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a reference / cheat sheet mapping engine-specific parameters to parsnip ones?
3 participants