Skip to content

Commit cee2bb8

Browse files
Merge pull request #1171 from tidymodels/doc-sparse-data
document sparse data usage in parsnip
2 parents eba5762 + 9a74d3a commit cee2bb8

25 files changed

+166
-0
lines changed

R/sparsevctrs.R

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
#' Using sparse data with parsnip
2+
#'
3+
#' You can figure out whether a given model engine supports sparse data by
4+
#' calling `get_encoding("name of model")` and looking at the `allow_sparse_x`
5+
#' column.
6+
#'
7+
#' Using sparse data for model fitting and prediction shouldn't require any
8+
#' additional configurations. Just pass in a sparse matrix such as dgCMatrix
9+
#' from the `Matrix` package or a sparse tibble from the sparsevctrs package
10+
#' to the data argument of [fit()], [fit_xy()], and [predict()].
11+
#'
12+
#' Models that don't support sparse data will try to convert to non-sparse data
13+
#' with warnings. If conversion isn’t possible, an informative error will be
14+
#' thrown.
15+
#'
16+
#' @name sparse_data
17+
NULL
18+
119
to_sparse_data_frame <- function(x, object, call = rlang::caller_env()) {
220
if (is_sparse_matrix(x)) {
321
if (allow_sparse(object)) {

_pkgdown.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ reference:
9090
- set_engine
9191
- set_mode
9292
- show_engines
93+
- sparse_data
9394
- tidy.model_fit
9495
- translate
9596
- starts_with("update")

man/details_boost_tree_xgboost.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_linear_reg_glmnet.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_logistic_reg_LiblineaR.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_logistic_reg_glmnet.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_multinom_reg_glmnet.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_rand_forest_ranger.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_svm_linear_LiblineaR.Rd

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/boost_tree_xgboost.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,11 @@ For classification, non-numeric outcomes (i.e., factors) are internally converte
6565
```{r child = "template-uses-case-weights.Rmd"}
6666
```
6767

68+
## Sparse Data
69+
70+
```{r child = "template-uses-sparse-data.Rmd"}
71+
```
72+
6873
## Other details
6974

7075
### Interfacing with the `params` argument

man/rmd/boost_tree_xgboost.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,11 @@ This model can utilize case weights during model fitting. To use them, see the d
116116

117117
The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights.
118118

119+
## Sparse Data
120+
121+
122+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
123+
119124
## Other details
120125

121126
### Interfacing with the `params` argument

man/rmd/linear_reg_glmnet.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
4848
```{r child = "template-uses-case-weights.Rmd"}
4949
```
5050

51+
## Sparse Data
52+
53+
```{r child = "template-uses-sparse-data.Rmd"}
54+
```
55+
5156
## Saving fitted model objects
5257

5358
```{r child = "template-butcher.Rmd"}

man/rmd/linear_reg_glmnet.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@ This model can utilize case weights during model fitting. To use them, see the d
5757

5858
The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights.
5959

60+
## Sparse Data
61+
62+
63+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
64+
6065
## Saving fitted model objects
6166

6267

man/rmd/logistic_reg_LiblineaR.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ logistic_reg(penalty = double(1), mixture = double(1)) %>%
4242
```{r child = "template-same-scale.Rmd"}
4343
```
4444

45+
## Sparse Data
46+
47+
```{r child = "template-uses-sparse-data.Rmd"}
48+
```
49+
4550
## Examples
4651

4752
The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.

man/rmd/logistic_reg_LiblineaR.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ Factor/categorical predictors need to be converted to numeric values (e.g., dumm
4949
Predictors should have the same scale. One way to achieve this is to center and
5050
scale each so that each predictor has mean zero and a variance of one.
5151

52+
## Sparse Data
53+
54+
55+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
56+
5257
## Examples
5358

5459
The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.

man/rmd/logistic_reg_glmnet.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
5050
```{r child = "template-uses-case-weights.Rmd"}
5151
```
5252

53+
## Sparse Data
54+
55+
```{r child = "template-uses-sparse-data.Rmd"}
56+
```
57+
5358
## Saving fitted model objects
5459

5560
```{r child = "template-butcher.Rmd"}

man/rmd/logistic_reg_glmnet.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ This model can utilize case weights during model fitting. To use them, see the d
5959

6060
The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights.
6161

62+
## Sparse Data
63+
64+
65+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
66+
6267
## Saving fitted model objects
6368

6469

man/rmd/multinom_reg_glmnet.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,11 @@ The "Fitting and Predicting with parsnip" article contains [examples](https://pa
5454
```{r child = "template-uses-case-weights.Rmd"}
5555
```
5656

57+
## Sparse Data
58+
59+
```{r child = "template-uses-sparse-data.Rmd"}
60+
```
61+
5762
## Saving fitted model objects
5863

5964
```{r child = "template-butcher.Rmd"}

man/rmd/multinom_reg_glmnet.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,11 @@ This model can utilize case weights during model fitting. To use them, see the d
6363

6464
The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights.
6565

66+
## Sparse Data
67+
68+
69+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
70+
6671
## Saving fitted model objects
6772

6873

man/rmd/rand_forest_ranger.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,11 @@ For `ranger` confidence intervals, the intervals are constructed using the form
7272
```{r child = "template-uses-case-weights.Rmd"}
7373
```
7474

75+
## Sparse Data
76+
77+
```{r child = "template-uses-sparse-data.Rmd"}
78+
```
79+
7580
## Saving fitted model objects
7681

7782
```{r child = "template-butcher.Rmd"}

man/rmd/rand_forest_ranger.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,11 @@ This model can utilize case weights during model fitting. To use them, see the d
103103

104104
The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights.
105105

106+
## Sparse Data
107+
108+
109+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
110+
106111
## Saving fitted model objects
107112

108113

man/rmd/svm_linear_LiblineaR.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,11 @@ Note that the `LiblineaR` engine does not produce class probabilities. When opti
6666
```{r child = "template-no-case-weights.Rmd"}
6767
```
6868

69+
## Sparse Data
70+
71+
```{r child = "template-uses-sparse-data.Rmd"}
72+
```
73+
6974
## Examples
7075

7176
The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.

man/rmd/svm_linear_LiblineaR.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ scale each so that each predictor has mean zero and a variance of one.
8585

8686
The underlying model implementation does not allow for case weights.
8787

88+
## Sparse Data
89+
90+
91+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
92+
8893
## Examples
8994

9095
The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.

man/rmd/template-uses-sparse-data.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.

man/sparse_data.Rd

Lines changed: 20 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)