document sparse data usage in parsnip

tidymodels · Sep 4, 2024 · 44403fc · 44403fc
1 parent f66a8f9
commit 44403fc
Show file tree

Hide file tree

Showing 25 changed files with 166 additions and 0 deletions.
diff --git a/R/sparsevctrs.R b/R/sparsevctrs.R
@@ -1,3 +1,21 @@
+#' Using sparse data with parsnip
+#' 
+#' You can figure out whether a given model engine supports sparse data by 
+#' calling `get_encoding("name of model")` and looking at the `allow_sparse_x`
+#' column.
+#' 
+#' Using sparse data for model fitting and prediction shouldn't require any 
+#' additional configurations. Just pass in a sparse matrix such as dgCMatrix 
+#' from the `Matrix` package or a sparse tibble from the `sparsevctrs` package 
+#' to the data argument of the respective [fit()], [fit_xy()], and [predict()].
+#' 
+#' Models that don't support sparse data will try to convert to non-sparse data 
+#' with warnings. An informative error will be thrown if conversion isn't
+#' possible.
+#' 
+#' @name sparse_data
+NULL
+
 to_sparse_data_frame <- function(x, object) {
   if (methods::is(x, "sparseMatrix")) {
     if (allow_sparse(object)) {

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -90,6 +90,7 @@ reference:
       - set_engine
       - set_mode
       - show_engines
+      - sparse_data
       - tidy.model_fit
       - translate
       - starts_with("update")

diff --git a/man/details_boost_tree_xgboost.Rd b/man/details_boost_tree_xgboost.Rd
diff --git a/man/details_linear_reg_glmnet.Rd b/man/details_linear_reg_glmnet.Rd
diff --git a/man/details_logistic_reg_LiblineaR.Rd b/man/details_logistic_reg_LiblineaR.Rd
diff --git a/man/details_logistic_reg_glmnet.Rd b/man/details_logistic_reg_glmnet.Rd
diff --git a/man/details_multinom_reg_glmnet.Rd b/man/details_multinom_reg_glmnet.Rd
diff --git a/man/details_rand_forest_ranger.Rd b/man/details_rand_forest_ranger.Rd
diff --git a/man/details_svm_linear_LiblineaR.Rd b/man/details_svm_linear_LiblineaR.Rd
diff --git a/man/rmd/boost_tree_xgboost.Rmd b/man/rmd/boost_tree_xgboost.Rmd
@@ -65,6 +65,11 @@ For classification, non-numeric outcomes (i.e., factors) are internally converte
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Other details
 
 ### Interfacing with the `params` argument

diff --git a/man/rmd/boost_tree_xgboost.md b/man/rmd/boost_tree_xgboost.md
@@ -116,6 +116,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Other details
 
 ### Interfacing with the `params` argument

diff --git a/man/rmd/linear_reg_glmnet.Rmd b/man/rmd/linear_reg_glmnet.Rmd
@@ -48,6 +48,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}

diff --git a/man/rmd/linear_reg_glmnet.md b/man/rmd/linear_reg_glmnet.md
@@ -57,6 +57,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 

diff --git a/man/rmd/logistic_reg_LiblineaR.Rmd b/man/rmd/logistic_reg_LiblineaR.Rmd
@@ -42,6 +42,11 @@ logistic_reg(penalty = double(1), mixture = double(1)) %>%
 ```{r child = "template-same-scale.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.

diff --git a/man/rmd/logistic_reg_LiblineaR.md b/man/rmd/logistic_reg_LiblineaR.md
@@ -49,6 +49,11 @@ Factor/categorical predictors need to be converted to numeric values (e.g., dumm
 Predictors should have the same scale. One way to achieve this is to center and 
 scale each so that each predictor has mean zero and a variance of one.
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#logistic-reg-LiblineaR) for `logistic_reg()` with the `"LiblineaR"` engine.

diff --git a/man/rmd/logistic_reg_glmnet.Rmd b/man/rmd/logistic_reg_glmnet.Rmd
@@ -50,6 +50,11 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}

diff --git a/man/rmd/logistic_reg_glmnet.md b/man/rmd/logistic_reg_glmnet.md
@@ -59,6 +59,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 

diff --git a/man/rmd/multinom_reg_glmnet.Rmd b/man/rmd/multinom_reg_glmnet.Rmd
@@ -54,6 +54,11 @@ The "Fitting and Predicting with parsnip" article contains [examples](https://pa
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}

diff --git a/man/rmd/multinom_reg_glmnet.md b/man/rmd/multinom_reg_glmnet.md
@@ -63,6 +63,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 

diff --git a/man/rmd/rand_forest_ranger.Rmd b/man/rmd/rand_forest_ranger.Rmd
@@ -72,6 +72,11 @@ For `ranger` confidence intervals, the intervals are  constructed using the form
 ```{r child = "template-uses-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Saving fitted model objects
 
 ```{r child = "template-butcher.Rmd"}

diff --git a/man/rmd/rand_forest_ranger.md b/man/rmd/rand_forest_ranger.md
@@ -103,6 +103,11 @@ This model can utilize case weights during model fitting. To use them, see the d
 
 The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that expect vectors of case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Saving fitted model objects
 
 

diff --git a/man/rmd/svm_linear_LiblineaR.Rmd b/man/rmd/svm_linear_LiblineaR.Rmd
@@ -66,6 +66,11 @@ Note that the `LiblineaR` engine does not produce class probabilities. When opti
 ```{r child = "template-no-case-weights.Rmd"}
 ```
 
+## Sparse Data
+
+```{r child = "template-uses-sparse-data.Rmd"}
+```
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.

diff --git a/man/rmd/svm_linear_LiblineaR.md b/man/rmd/svm_linear_LiblineaR.md
@@ -85,6 +85,11 @@ scale each so that each predictor has mean zero and a variance of one.
 
 The underlying model implementation does not allow for case weights. 
 
+## Sparse Data
+
+
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
+
 ## Examples 
 
 The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#svm-linear-LiblineaR) for `svm_linear()` with the `"LiblineaR"` engine.

diff --git a/man/rmd/template-uses-sparse-data.Rmd b/man/rmd/template-uses-sparse-data.Rmd
@@ -0,0 +1 @@
+This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.
diff --git a/man/sparse_data.Rd b/man/sparse_data.Rd
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the `Matrix` package and sparse tibbles from the `sparsevctrs` package are supported. See [sparse_data] for more information.