|
| 1 | +#' Formulas with special terms in tidymodels |
| 2 | +#' |
| 3 | +#' @description |
| 4 | +#' |
| 5 | +#' In R, formulas provide a compact, symbolic notation to specify model terms. |
| 6 | +#' Many modeling functions in R make use of ["specials"][stats::terms.formula], |
| 7 | +#' or nonstandard notations used in formulas. Specials are defined and handled as |
| 8 | +#' a special case by a given modeling package. For example, the mgcv package, |
| 9 | +#' which provides support for |
| 10 | +#' [generalized additive models][parsnip::gen_additive_mod] in R, defines a |
| 11 | +#' function `s()` to be in-lined into formulas. It can be used like so: |
| 12 | +#' |
| 13 | +#' ``` r |
| 14 | +#' mgcv::gam(mpg ~ wt + s(disp, k = 5), data = mtcars) |
| 15 | +#' ``` |
| 16 | +#' |
| 17 | +#' In this example, the `s()` special defines a smoothing term that the mgcv |
| 18 | +#' package knows to look for when preprocessing model input. |
| 19 | +#' |
| 20 | +#' The parsnip package can handle most specials without issue. The analogous |
| 21 | +#' code for specifying this generalized additive model |
| 22 | +#' [with the parsnip "mgcv" engine][parsnip::details_gen_additive_mod_mgcv] |
| 23 | +#' looks like: |
| 24 | +#' |
| 25 | +#' ``` r |
| 26 | +#' gen_additive_mod() %>% |
| 27 | +#' set_mode("regression") %>% |
| 28 | +#' set_engine("mgcv") %>% |
| 29 | +#' fit(mpg ~ wt + s(disp, k = 5), data = mtcars) |
| 30 | +#' ``` |
| 31 | +#' |
| 32 | +#' However, parsnip is often used in conjunction with the greater tidymodels |
| 33 | +#' package ecosystem, which defines its own pre-processing infrastructure and |
| 34 | +#' functionality via packages like hardhat and recipes. The specials defined |
| 35 | +#' in many modeling packages introduce conflicts with that infrastructure. |
| 36 | +#' |
| 37 | +#' To support specials while also maintaining consistent syntax elsewhere in |
| 38 | +#' the ecosystem, **tidymodels delineates between two types of formulas: |
| 39 | +#' preprocessing formulas and model formulas**. Preprocessing formulas specify |
| 40 | +#' the input variables, while model formulas determine the model structure. |
| 41 | +#' |
| 42 | +#' @section Example: |
| 43 | +#' |
| 44 | +#' To create the preprocessing formula from the model formula, just remove |
| 45 | +#' the specials, retaining references to input variables themselves. For example: |
| 46 | +#' |
| 47 | +#' ``` |
| 48 | +#' model_formula <- mpg ~ wt + s(disp, k = 5) |
| 49 | +#' preproc_formula <- mpg ~ wt + disp |
| 50 | +#' ``` |
| 51 | +#' |
| 52 | +#' \itemize{ |
| 53 | +#' \item **With parsnip,** use the model formula: |
| 54 | +#' |
| 55 | +#' ``` r |
| 56 | +#' model_spec <- |
| 57 | +#' gen_additive_mod() %>% |
| 58 | +#' set_mode("regression") %>% |
| 59 | +#' set_engine("mgcv") |
| 60 | +#' |
| 61 | +#' model_spec %>% |
| 62 | +#' fit(model_formula, data = mtcars) |
| 63 | +#' ``` |
| 64 | +#' |
| 65 | +#' \item **With recipes**, use the preprocessing formula only: |
| 66 | +#' |
| 67 | +#' ``` r |
| 68 | +#' library(recipes) |
| 69 | +#' |
| 70 | +#' recipe(preproc_formula, mtcars) |
| 71 | +#' ``` |
| 72 | +#' |
| 73 | +#' The recipes package supplies a large variety of preprocessing techniques |
| 74 | +#' that may replace the need for specials altogether, in some cases. |
| 75 | +#' |
| 76 | +#' \item **With workflows,** use the preprocessing formula everywhere, but |
| 77 | +#' pass the model formula to the `formula` argument in `add_model()`: |
| 78 | +#' |
| 79 | +#' ``` r |
| 80 | +#' library(workflows) |
| 81 | +#' |
| 82 | +#' wflow <- |
| 83 | +#' workflow() %>% |
| 84 | +#' add_formula(preproc_formula) %>% |
| 85 | +#' add_model(model_spec, formula = model_formula) |
| 86 | +#' |
| 87 | +#' fit(wflow, data = mtcars) |
| 88 | +#' ``` |
| 89 | +#' |
| 90 | +#' The workflow will then pass the model formula to parsnip, using the |
| 91 | +#' preprocessor formula elsewhere. We would still use the preprocessing |
| 92 | +#' formula if we had added a recipe preprocessor using `add_recipe()` |
| 93 | +#' instead a formula via `add_formula()`. |
| 94 | +#' |
| 95 | +#' } |
| 96 | +#' |
| 97 | +#' @name model_formula |
| 98 | +NULL |
0 commit comments