Merge pull request #1171 from tidymodels/doc-sparse-data

EmilHvitfeldt · web-flow · commit cee2bb8fc476 · 2024-09-09T12:26:49.000-07:00
document sparse data usage in parsnip
diff --git a/R/sparsevctrs.R b/R/sparsevctrs.R
@@ -1,3 +1,21 @@
+#' Using sparse data with parsnip
+#' 
+#' You can figure out whether a given model engine supports sparse data by 
+#' calling `get_encoding("name of model")` and looking at the `allow_sparse_x`
+#' column.
+#' 
+#' Using sparse data for model fitting and prediction shouldn't require any 
+#' additional configurations. Just pass in a sparse matrix such as dgCMatrix 
+#' from the `Matrix` package or a sparse tibble from the sparsevctrs package 
+#' to the data argument of [fit()], [fit_xy()], and [predict()].
+#' 
+#' Models that don't support sparse data will try to convert to non-sparse data 
+#' with warnings. If conversion isn’t possible, an informative error will be 
+#' thrown.
+#' 
+#' @name sparse_data
+NULL
+
 to_sparse_data_frame <- function(x, object, call = rlang::caller_env()) {
   if (is_sparse_matrix(x)) {
     if (allow_sparse(object)) {
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -90,6 +90,7 @@ reference:
       - set_engine
       - set_mode
       - show_engines
+      - sparse_data
       - tidy.model_fit
       - translate
       - starts_with("update")
diff --git a/man/details_boost_tree_xgboost.Rd b/man/details_boost_tree_xgboost.Rd
diff --git a/man/details_linear_reg_glmnet.Rd b/man/details_linear_reg_glmnet.Rd
diff --git a/man/details_logistic_reg_LiblineaR.Rd b/man/details_logistic_reg_LiblineaR.Rd
diff --git a/man/details_logistic_reg_glmnet.Rd b/man/details_logistic_reg_glmnet.Rd
diff --git a/man/details_multinom_reg_glmnet.Rd b/man/details_multinom_reg_glmnet.Rd
diff --git a/man/details_rand_forest_ranger.Rd b/man/details_rand_forest_ranger.Rd
diff --git a/man/details_svm_linear_LiblineaR.Rd b/man/details_svm_linear_LiblineaR.Rd
diff --git a/man/rmd/boost_tree_xgboost.Rmd b/man/rmd/boost_tree_xgboost.Rmd
@@ -65,6 +65,11 @@ For classification, non-numeric outcomes (i.e., factors) are internally converte
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Other details
 
 ### Interfacing with the `params` argument
diff --git a/man/rmd/boost_tree_xgboost.md b/man/rmd/boost_tree_xgboost.md
@@ -116,6 +116,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Other details
 
 ### Interfacing with the `params` argument
diff --git a/man/rmd/linear_reg_glmnet.Rmd b/man/rmd/linear_reg_glmnet.Rmd
@@ -48,6 +48,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}
diff --git a/man/rmd/linear_reg_glmnet.md b/man/rmd/linear_reg_glmnet.md
@@ -57,6 +57,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 
diff --git a/man/rmd/logistic_reg_LiblineaR.Rmd b/man/rmd/logistic_reg_LiblineaR.Rmd
@@ -42,6 +42,11 @@ logistic_reg(penalty = double(1), mixture = double(1)) %>%
 ```{r child = "template-same-scale.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.
diff --git a/man/rmd/logistic_reg_LiblineaR.md b/man/rmd/logistic_reg_LiblineaR.md
@@ -49,6 +49,11 @@ Factor/categorical predictors need to be converted to numeric values (e.g., dumm
 Predictors should have the same scale. One way to achieve this is to center and 
 scale each so that each predictor has mean zero and a variance of one.
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.
diff --git a/man/rmd/logistic_reg_glmnet.Rmd b/man/rmd/logistic_reg_glmnet.Rmd
@@ -50,6 +50,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}
diff --git a/man/rmd/logistic_reg_glmnet.md b/man/rmd/logistic_reg_glmnet.md
@@ -59,6 +59,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 
diff --git a/man/rmd/multinom_reg_glmnet.Rmd b/man/rmd/multinom_reg_glmnet.Rmd
@@ -54,6 +54,11 @@ The "Fitting and Predicting with parsnip" article contains [examples](https://pa
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}
diff --git a/man/rmd/multinom_reg_glmnet.md b/man/rmd/multinom_reg_glmnet.md
@@ -63,6 +63,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 
diff --git a/man/rmd/rand_forest_ranger.Rmd b/man/rmd/rand_forest_ranger.Rmd
@@ -72,6 +72,11 @@ For `ranger` confidence intervals, the intervals are  constructed using the form
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}
diff --git a/man/rmd/rand_forest_ranger.md b/man/rmd/rand_forest_ranger.md
@@ -103,6 +103,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 
diff --git a/man/rmd/svm_linear_LiblineaR.Rmd b/man/rmd/svm_linear_LiblineaR.Rmd
@@ -66,6 +66,11 @@ Note that the `LiblineaR` engine does not produce class probabilities. When opti
 ```{r child = "template-no-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.
diff --git a/man/rmd/svm_linear_LiblineaR.md b/man/rmd/svm_linear_LiblineaR.md
@@ -85,6 +85,11 @@ scale each so that each predictor has mean zero and a variance of one.
 
 The underlying model implementation does not allow for case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.
diff --git a/man/rmd/template-uses-sparse-data.Rmd b/man/rmd/template-uses-sparse-data.Rmd
@@ -0,0 +1 @@
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
diff --git a/man/sparse_data.Rd b/man/sparse_data.Rd

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.