Skip to content

Add defaults to engine specific params in model docs #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c65b2cf
First draft of engine specific defaults into docs
juliasilge May 27, 2020
7f976f9
Move argument default table to man/rmd/
juliasilge May 27, 2020
9ec200a
Correct loss_reduction arg for Spark boosted trees
juliasilge May 28, 2020
56fb899
Engine specific defaults for boosted trees
juliasilge May 28, 2020
e039fb2
No parentheses when no default in table of args
juliasilge May 28, 2020
e9ac0c7
Engine specific defaults for linear regression
juliasilge May 28, 2020
d4ce198
Engine specific defaults for logistic regression
juliasilge May 28, 2020
16f1825
Engine specific defaults for MARS
juliasilge May 28, 2020
5938408
Engine specific defaults for multilayer perceptron
juliasilge May 28, 2020
f1c3197
Engine specific defaults for multinomial regression
juliasilge May 28, 2020
c81509f
Engine specific defaults for nearest neighbors
juliasilge May 28, 2020
16c0271
Don't need defaults for surv_reg() ("weibull" is the default)
juliasilge May 28, 2020
212765f
Engine specific defaults for random forest
juliasilge May 28, 2020
3717f6d
More detail on boosted tree
juliasilge May 28, 2020
2789b86
Specify what to join by and document
juliasilge May 28, 2020
9c716b6
I think the best we can do for engine specific defaults for SVM
juliasilge May 28, 2020
77d0387
Reorder rand_forest() notes to match table
juliasilge May 28, 2020
d7b371b
Redocument to create new versions of tables
juliasilge May 28, 2020
f49ff09
Move functions for finding args to /man/rmd, redocument
juliasilge May 29, 2020
73dbb73
sigma for RBF kernal also depends on data
juliasilge Jun 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,6 @@ export(boost_tree)
export(check_empty_ellipse)
export(check_final_param)
export(control_parsnip)
export(convert_args)
export(convert_stan_interval)
export(decision_tree)
export(eval_args)
Expand Down
27 changes: 0 additions & 27 deletions R/aaa.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,33 +30,6 @@ convert_stan_interval <- function(x, level = 0.95, lower = TRUE) {
res
}

#' Make a table of arguments
#' @param model_name A character string for the model
#' @keywords internal
#' @export
convert_args <- function(model_name) {
envir <- get_model_env()

args <-
ls(envir) %>%
tibble::tibble(name = .) %>%
dplyr::filter(grepl("args", name)) %>%
dplyr::mutate(model = sub("_args", "", name),
args = purrr::map(name, ~envir[[.x]])) %>%
tidyr::unnest(args) %>%
dplyr::select(model:original)

convert_df <- args %>%
dplyr::filter(grepl(model_name, model)) %>%
dplyr::select(-model) %>%
tidyr::pivot_wider(names_from = engine, values_from = original)

convert_df %>%
knitr::kable(col.names = paste0("**", colnames(convert_df), "**"))

}


# ------------------------------------------------------------------------------
# nocov

Expand Down
4 changes: 2 additions & 2 deletions R/boost_tree_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -317,8 +317,8 @@ set_model_arg(
set_model_arg(
model = "boost_tree",
eng = "spark",
parsnip = "min_info_gain",
original = "loss_reduction",
parsnip = "loss_reduction",
original = "min_info_gain",
func = list(pkg = "dials", fun = "loss_reduction"),
has_submodel = FALSE
)
Expand Down
23 changes: 14 additions & 9 deletions man/boost_tree.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 0 additions & 15 deletions man/convert_args.Rd

This file was deleted.

12 changes: 7 additions & 5 deletions man/decision_tree.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/linear_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/logistic_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/mars.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 10 additions & 8 deletions man/mlp.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/multinom_reg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/nearest_neighbor.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 15 additions & 4 deletions man/rand_forest.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 29 additions & 2 deletions man/rmd/boost-tree.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Engine Details

```{r, child = "setup.Rmd", include = FALSE}
```

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below:

## xgboost
Expand Down Expand Up @@ -50,9 +53,33 @@ boost_tree() %>%

## Parameter translations

The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters:
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.

```{r echo = FALSE, results = "asis"}
parsnip::convert_args("boost_tree")
get_defaults_boost_tree <- function() {
tibble::tribble(
~model, ~engine, ~parsnip, ~original, ~default,
"boost_tree", "xgboost", "tree_depth", "max_depth", get_arg("parsnip", "xgb_train", "max_depth"),
"boost_tree", "xgboost", "trees", "nrounds", get_arg("parsnip", "xgb_train", "nrounds"),
"boost_tree", "xgboost", "learn_rate", "eta", get_arg("parsnip", "xgb_train", "eta"),
"boost_tree", "xgboost", "mtry", "colsample_bytree", get_arg("parsnip", "xgb_train", "colsample_bytree"),
"boost_tree", "xgboost", "min_n", "min_child_weight", get_arg("parsnip", "xgb_train", "min_child_weight"),
"boost_tree", "xgboost", "loss_reduction", "gamma", get_arg("parsnip", "xgb_train", "gamma"),
"boost_tree", "xgboost", "sample_size", "subsample", get_arg("parsnip", "xgb_train", "subsample"),
"boost_tree", "C5.0", "trees", "trials", get_arg("parsnip", "C5.0_train", "trials"),
"boost_tree", "C5.0", "min_n", "minCases", get_arg("C50", "C5.0Control", "minCases"),
"boost_tree", "C5.0", "sample_size", "sample", get_arg("C50", "C5.0Control", "sample"),
"boost_tree", "spark", "tree_depth", "max_depth", get_arg("sparklyr", "ml_gradient_boosted_trees", "max_depth"),
"boost_tree", "spark", "trees", "max_iter", get_arg("sparklyr", "ml_gradient_boosted_trees", "max_iter"),
"boost_tree", "spark", "learn_rate", "step_size", get_arg("sparklyr", "ml_gradient_boosted_trees", "step_size"),
"boost_tree", "spark", "mtry", "feature_subset_strategy", "see below",
"boost_tree", "spark", "min_n", "min_instances_per_node", get_arg("sparklyr", "ml_gradient_boosted_trees", "min_instances_per_node"),
"boost_tree", "spark", "loss_reduction", "min_info_gain", get_arg("sparklyr", "ml_gradient_boosted_trees", "min_info_gain"),
"boost_tree", "spark", "sample_size", "subsampling_rate", get_arg("sparklyr", "ml_gradient_boosted_trees", "subsampling_rate"),

)
}
convert_args("boost_tree")
```

For spark, the default `mtry` is the square root of the number of predictors for classification, and one-third of the predictors for regression.
18 changes: 16 additions & 2 deletions man/rmd/decision-tree.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Engine Details

```{r, child = "setup.Rmd", include = FALSE}
```

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below:

## rpart
Expand Down Expand Up @@ -52,9 +55,20 @@ decision_tree() %>%

## Parameter translations

The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters:
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.

```{r echo = FALSE, results = "asis"}
parsnip::convert_args("decision_tree")
get_defaults_decision_tree <- function() {
tibble::tribble(
~model, ~engine, ~parsnip, ~original, ~default,
"decision_tree", "rpart", "tree_depth", "maxdepth", get_arg("rpart", "rpart.control", "maxdepth"),
"decision_tree", "rpart", "min_n", "minsplit", get_arg("rpart", "rpart.control", "minsplit"),
"decision_tree", "rpart", "cost_complexity", "cp", get_arg("rpart", "rpart.control", "cp"),
"decision_tree", "C5.0", "min_n", "minCases", get_arg("C50", "C5.0Control", "minCases"),
"decision_tree", "spark", "tree_depth", "max_depth", get_arg("sparklyr", "ml_decision_tree", "max_depth"),
"decision_tree", "spark", "min_n", "min_instances_per_node", get_arg("sparklyr", "ml_decision_tree", "min_instances_per_node"),
)
}
convert_args("decision_tree")
```

19 changes: 16 additions & 3 deletions man/rmd/linear-reg.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Engine Details

```{r, child = "setup.Rmd", include = FALSE}
```

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below.

## lm
Expand Down Expand Up @@ -68,10 +71,20 @@ linear_reg() %>%

## Parameter translations

The standardized parameter names in parsnip can be mapped to their original names
in each engine that has main parameters:
The standardized parameter names in parsnip can be mapped to their original
names in each engine that has main parameters. Each engine typically has a
different default value (shown in parentheses) for each parameter.

```{r echo = FALSE, results = "asis"}
parsnip::convert_args("linear_reg")
get_defaults_linear_reg <- function() {
tibble::tribble(
~model, ~engine, ~parsnip, ~original, ~default,
"linear_reg", "glmnet", "mixture", "alpha", get_arg("glmnet", "glmnet", "alpha"),
"linear_reg", "spark", "penalty", "reg_param", get_arg("sparklyr", "ml_linear_regression", "reg_param"),
"linear_reg", "spark", "mixture", "elastic_net_param", get_arg("sparklyr", "ml_linear_regression", "elastic_net_param"),
"linear_reg", "keras", "penalty", "penalty", get_arg("parsnip", "keras_mlp", "penalty"),
)
}
convert_args("linear_reg")
```

Loading