diff --git a/DESCRIPTION b/DESCRIPTION index 67eb31d..741af8c 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: embed Title: Extra Recipes for Encoding Predictors -Version: 1.1.3.9000 +Version: 1.1.4.9000 Authors@R: c( person("Emil", "Hvitfeldt", , "emil.hvitfeldt@posit.co", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-0679-1945")), diff --git a/NEWS.md b/NEWS.md index e4b4170..dd05748 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,9 @@ # embed (development version) +# embed 1.1.4 + +## Improvements + * `step_umap()` has gained `initial` and `target_weight` arguments. (#213) * Calling `?tidy.step_*()` now sends you to the documentation for `step_*()` where the outcome is documented. (#216) diff --git a/R/lencode_bayes.R b/R/lencode_bayes.R index bd3ce07..b1b91f4 100644 --- a/R/lencode_bayes.R +++ b/R/lencode_bayes.R @@ -83,13 +83,13 @@ #' Modeling," arXiv:1611.09477 #' #' "Hierarchical Partial Pooling for Repeated Binary Trials" -#' \url{https://tinyurl.com/stan-pooling} +#' \url{https://CRAN.R-project.org/package=rstanarm/vignettes/pooling.html} #' #' "Prior Distributions for `rstanarm` Models" -#' \url{https://tinyurl.com/stan-priors} +#' \url{http://mc-stan.org/rstanarm/reference/priors.html} #' #' "Estimating Generalized (Non-)Linear Models with Group-Specific Terms with -#' `rstanarm`" \url{https://tinyurl.com/stan-glm-grouped} +#' `rstanarm`" \url{http://mc-stan.org/rstanarm/articles/glmer.html} #' #' @examplesIf rlang::is_installed("modeldata") #' library(recipes) diff --git a/README.md b/README.md index b03f229..4e61162 100644 --- a/README.md +++ b/README.md @@ -25,55 +25,57 @@ dependencies, [`rstanarm`](https://CRAN.r-project.org/package=rstanarm), Some steps handle categorical predictors: -- `step_lencode_glm()`, `step_lencode_bayes()`, and - `step_lencode_mixed()` estimate the effect of each of the factor - levels on the outcome and these estimates are used as the new - encoding. The estimates are estimated by a generalized linear model. - This step can be executed without pooling (via `glm`) or with partial - pooling (`stan_glm` or `lmer`). Currently implemented for numeric and - two-class outcomes. - -- `step_embed()` uses `keras::layer_embedding` to translate the original - *C* factor levels into a set of *D* new variables (\< *C*). The model - fitting routine optimizes which factor levels are mapped to each of - the new variables as well as the corresponding regression coefficients - (i.e., neural network weights) that will be used as the new encodings. - -- `step_woe()` creates new variables based on weight of evidence - encodings. - -- `step_feature_hash()` can create indicator variables using feature - hashing. +- `step_lencode_glm()`, `step_lencode_bayes()`, and + `step_lencode_mixed()` estimate the effect of each of the factor + levels on the outcome and these estimates are used as the new + encoding. The estimates are estimated by a generalized linear model. + This step can be executed without pooling (via `glm`) or with + partial pooling (`stan_glm` or `lmer`). Currently implemented for + numeric and two-class outcomes. + +- `step_embed()` uses `keras::layer_embedding` to translate the + original *C* factor levels into a set of *D* new variables (\< *C*). + The model fitting routine optimizes which factor levels are mapped + to each of the new variables as well as the corresponding regression + coefficients (i.e., neural network weights) that will be used as the + new encodings. + +- `step_woe()` creates new variables based on weight of evidence + encodings. + +- `step_feature_hash()` can create indicator variables using feature + hashing. For numeric predictors: -- `step_umap()` uses a nonlinear transformation similar to t-SNE but can - be used to project the transformation on new data. Both supervised and - unsupervised methods can be used. +- `step_umap()` uses a nonlinear transformation similar to t-SNE but + can be used to project the transformation on new data. Both + supervised and unsupervised methods can be used. -- `step_discretize_xgb()` and `step_discretize_cart()` can make binned - versions of numeric predictors using supervised tree-based models. +- `step_discretize_xgb()` and `step_discretize_cart()` can make binned + versions of numeric predictors using supervised tree-based models. -- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature - extraction with sparsity of the component loadings. +- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature + extraction with sparsity of the component loadings. Some references for these methods are: -- Francois C and Allaire JJ (2018) [*Deep Learning with - R*](https://www.manning.com/books/deep-learning-with-r), Manning -- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical - Variables](https://arxiv.org/abs/1604.06737)” -- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality - categorical attributes in classification and prediction - problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),” - ACM SIGKDD Explorations Newsletter, 3(1), 27-32. -- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for - Predictive Modeling](https://arxiv.org/abs/1611.09477)” -- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation and - Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426) -- Good, I. J. (1985), “[Weight of evidence: A brief - survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”, - Bayesian Statistics, 2, pp.249-270. +- Francois C and Allaire JJ (2018) [*Deep Learning with + R*](https://www.manning.com/books/deep-learning-with-r), Manning +- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical + Variables](https://arxiv.org/abs/1604.06737)” +- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality + categorical attributes in classification and prediction + problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),” + ACM SIGKDD Explorations Newsletter, 3(1), 27-32. +- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for + Predictive Modeling](https://arxiv.org/abs/1611.09477)” +- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation + and Projection for Dimension + Reduction](https://arxiv.org/abs/1802.03426) +- Good, I. J. (1985), “[Weight of evidence: A brief + survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”, + Bayesian Statistics, 2, pp.249-270. ## Getting Started @@ -113,18 +115,18 @@ This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. -- For questions and discussions about tidymodels packages, modeling, and - machine learning, please [post on RStudio - Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). +- For questions and discussions about tidymodels packages, modeling, + and machine learning, please [post on RStudio + Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). -- If you think you have encountered a bug, please [submit an - issue](https://github.com/tidymodels/embed/issues). +- If you think you have encountered a bug, please [submit an + issue](https://github.com/tidymodels/embed/issues). -- Either way, learn how to create and share a - [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) - (a minimal, reproducible example), to clearly communicate about your - code. +- Either way, learn how to create and share a + [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) + (a minimal, reproducible example), to clearly communicate about your + code. -- Check out further details on [contributing guidelines for tidymodels - packages](https://www.tidymodels.org/contribute/) and [how to get - help](https://www.tidymodels.org/help/). +- Check out further details on [contributing guidelines for tidymodels + packages](https://www.tidymodels.org/contribute/) and [how to get + help](https://www.tidymodels.org/help/). diff --git a/man/step_lencode_bayes.Rd b/man/step_lencode_bayes.Rd index 64b65d2..1b7cfd0 100644 --- a/man/step_lencode_bayes.Rd +++ b/man/step_lencode_bayes.Rd @@ -131,13 +131,13 @@ Zumel N and Mount J (2017) "vtreat: a data.frame Processor for Predictive Modeling," arXiv:1611.09477 "Hierarchical Partial Pooling for Repeated Binary Trials" -\url{https://tinyurl.com/stan-pooling} +\url{https://CRAN.R-project.org/package=rstanarm/vignettes/pooling.html} "Prior Distributions for \code{rstanarm} Models" -\url{https://tinyurl.com/stan-priors} +\url{http://mc-stan.org/rstanarm/reference/priors.html} "Estimating Generalized (Non-)Linear Models with Group-Specific Terms with -\code{rstanarm}" \url{https://tinyurl.com/stan-glm-grouped} +\code{rstanarm}" \url{http://mc-stan.org/rstanarm/articles/glmer.html} } \concept{preprocessing encoding} \keyword{datagen} diff --git a/revdep/README.md b/revdep/README.md index f80d9db..3ac027a 100644 --- a/revdep/README.md +++ b/revdep/README.md @@ -1,101 +1,101 @@ # Platform -|field |value | -|:--------|:------------------------------------------------------------| -|version |R version 4.3.1 (2023-06-16) | -|os |macOS Ventura 13.6 | -|system |aarch64, darwin20 | -|ui |X11 | -|language |(EN) | -|collate |en_US.UTF-8 | -|ctype |en_US.UTF-8 | -|tz |America/Los_Angeles | -|date |2023-10-17 | -|pandoc |3.1.3 @ /Users/emilhvitfeldt/miniforge3/bin/ (via rmarkdown) | +|field |value | +|:--------|:---------------------------------------------| +|version |R version 4.3.2 (2023-10-31) | +|os |macOS Sonoma 14.3.1 | +|system |aarch64, darwin20 | +|ui |X11 | +|language |(EN) | +|collate |en_US.UTF-8 | +|ctype |en_US.UTF-8 | +|tz |America/Los_Angeles | +|date |2024-03-19 | +|pandoc |2.17.1.1 @ /opt/homebrew/bin/ (via rmarkdown) | # Dependencies |package |old |new |Δ | |:------------|:----------|:----------|:--| -|embed |1.1.2 |1.1.2.9000 |* | -|backports |1.4.1 |1.4.1 | | -|base64enc |0.1-3 |0.1-3 | | -|BH |1.81.0-1 |1.81.0-1 | | -|cli |3.6.1 |3.6.1 | | +|embed |1.1.3 |1.1.3.9000 |* | +|backports |1.4.1 |NA |* | +|base64enc |0.1-3 |NA |* | +|BH |1.84.0-0 |1.84.0-0 | | +|cli |3.6.2 |3.6.2 | | |clock |0.7.0 |0.7.0 | | -|config |0.3.2 |0.3.2 | | -|cpp11 |0.4.6 |0.4.6 | | -|data.table |1.14.8 |1.14.8 | | +|config |0.3.2 |NA |* | +|cpp11 |0.4.7 |0.4.7 | | +|data.table |1.15.2 |1.15.2 | | |diagram |1.6.5 |1.6.5 | | -|digest |0.6.33 |0.6.33 | | -|dplyr |1.1.3 |1.1.3 | | -|dqrng |0.3.1 |0.3.1 | | +|digest |0.6.35 |0.6.35 | | +|dplyr |1.1.4 |1.1.4 | | +|dqrng |0.3.2 |0.3.2 | | |ellipsis |0.3.2 |0.3.2 | | -|fansi |1.0.5 |1.0.5 | | -|FNN |1.1.3.2 |1.1.3.2 | | +|fansi |1.0.6 |1.0.6 | | +|FNN |1.1.4 |1.1.4 | | |furrr |0.3.1 |0.3.1 | | -|future |1.33.0 |1.33.0 | | -|future.apply |1.11.0 |1.11.0 | | +|future |1.33.1 |1.33.1 | | +|future.apply |1.11.1 |1.11.1 | | |generics |0.1.3 |0.1.3 | | -|globals |0.16.2 |0.16.2 | | -|glue |1.6.2 |1.6.2 | | +|globals |0.16.3 |0.16.3 | | +|glue |1.7.0 |1.7.0 | | |gower |1.0.1 |1.0.1 | | -|hardhat |1.3.0 |1.3.0 | | -|here |1.0.1 |1.0.1 | | +|hardhat |1.3.1 |1.3.1 | | +|here |1.0.1 |NA |* | |ipred |0.9-14 |0.9-14 | | |irlba |2.3.5.1 |2.3.5.1 | | -|jsonlite |1.8.7 |1.8.7 | | -|keras |2.13.0 |2.13.0 | | -|lava |1.7.2.1 |1.7.2.1 | | -|lifecycle |1.0.3 |1.0.3 | | -|listenv |0.9.0 |0.9.0 | | +|jsonlite |1.8.8 |NA |* | +|keras |2.13.0 |NA |* | +|lava |1.8.0 |1.8.0 | | +|lifecycle |1.0.4 |1.0.4 | | +|listenv |0.9.1 |0.9.1 | | |lubridate |1.9.3 |1.9.3 | | |magrittr |2.0.3 |2.0.3 | | |numDeriv |2016.8-1.1 |2016.8-1.1 | | -|parallelly |1.36.0 |1.36.0 | | +|parallelly |1.37.1 |1.37.1 | | |pillar |1.9.0 |1.9.0 | | |pkgconfig |2.0.3 |2.0.3 | | -|png |0.1-8 |0.1-8 | | -|processx |3.8.2 |3.8.2 | | +|png |0.1-8 |NA |* | +|processx |3.8.4 |NA |* | |prodlim |2023.08.28 |2023.08.28 | | |progressr |0.14.0 |0.14.0 | | -|ps |1.7.5 |1.7.5 | | +|ps |1.7.6 |NA |* | |purrr |1.0.2 |1.0.2 | | |R6 |2.5.1 |2.5.1 | | -|rappdirs |0.3.3 |0.3.3 | | -|Rcpp |1.0.11 |1.0.11 | | -|RcppAnnoy |0.0.21 |0.0.21 | | +|rappdirs |0.3.3 |NA |* | +|Rcpp |1.0.12 |1.0.12 | | +|RcppAnnoy |0.0.22 |0.0.22 | | |RcppProgress |0.4.2 |0.4.2 | | -|RcppTOML |0.2.2 |0.2.2 | | -|recipes |1.0.8 |1.0.8 | | -|reticulate |1.34.0 |1.34.0 | | -|rlang |1.1.1 |1.1.1 | | -|rprojroot |2.0.3 |2.0.3 | | +|RcppTOML |0.2.2 |NA |* | +|recipes |1.0.10 |1.0.10 | | +|reticulate |1.35.0 |NA |* | +|rlang |1.1.3 |1.1.3 | | +|rprojroot |2.0.4 |NA |* | |rsample |1.2.0 |1.2.0 | | -|rstudioapi |0.15.0 |0.15.0 | | -|shape |1.4.6 |1.4.6 | | +|rstudioapi |0.15.0 |NA |* | +|shape |1.4.6.1 |1.4.6.1 | | |sitmo |2.0.2 |2.0.2 | | |slider |0.3.1 |0.3.1 | | |SQUAREM |2021.1 |2021.1 | | -|stringi |1.7.12 |1.7.12 | | -|stringr |1.5.0 |1.5.0 | | -|tensorflow |2.14.0 |2.14.0 | | -|tfautograph |0.3.2 |0.3.2 | | -|tfruns |1.5.1 |1.5.1 | | +|stringi |1.8.3 |1.8.3 | | +|stringr |1.5.1 |1.5.1 | | +|tensorflow |2.15.0 |NA |* | +|tfautograph |0.3.2 |NA |* | +|tfruns |1.5.2 |NA |* | |tibble |3.2.1 |3.2.1 | | -|tidyr |1.3.0 |1.3.0 | | -|tidyselect |1.2.0 |1.2.0 | | -|timechange |0.2.0 |0.2.0 | | -|timeDate |4022.108 |4022.108 | | +|tidyr |1.3.1 |1.3.1 | | +|tidyselect |1.2.1 |1.2.1 | | +|timechange |0.3.0 |0.3.0 | | +|timeDate |4032.109 |4032.109 | | |tzdb |0.4.0 |0.4.0 | | -|utf8 |1.2.3 |1.2.3 | | +|utf8 |1.2.4 |1.2.4 | | |uwot |0.1.16 |0.1.16 | | -|vctrs |0.6.4 |0.6.4 | | -|warp |0.2.0 |0.2.0 | | -|whisker |0.4.1 |0.4.1 | | -|withr |2.5.1 |2.5.1 | | -|yaml |2.3.7 |2.3.7 | | -|zeallot |0.1.0 |0.1.0 | | +|vctrs |0.6.5 |0.6.5 | | +|warp |0.2.1 |0.2.1 | | +|whisker |0.4.1 |NA |* | +|withr |3.0.0 |3.0.0 | | +|yaml |2.3.8 |NA |* | +|zeallot |0.1.0 |NA |* | # Revdeps