Skip to content

Commit

Permalink
revisit docs on model formulas (#1022)
Browse files Browse the repository at this point in the history
  • Loading branch information
simonpcouch authored Nov 8, 2023
1 parent 8a5b8b3 commit 86f8a4e
Show file tree
Hide file tree
Showing 18 changed files with 49 additions and 40 deletions.
3 changes: 3 additions & 0 deletions man/details_boost_tree_xgboost.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 6 additions & 8 deletions man/details_gen_additive_mod_mgcv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/details_mlp_brulee.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 9 additions & 7 deletions man/details_proportional_hazards_glmnet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_proportional_hazards_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_surv_reg_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_survival_reg_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/rmd/gen_additive_mod_mgcv.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ gen_additive_mod() %>%
The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter.


However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.

```{r}
spec <-
Expand All @@ -69,13 +69,13 @@ spec <-
set_mode("regression")
workflow() %>%
add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>%
add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>%
add_formula(mpg ~ wt + gear + cyl + disp) %>%
fit(data = mtcars) %>%
extract_fit_engine()
```

The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.
To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].

## Preprocessing requirements

Expand Down
6 changes: 3 additions & 3 deletions man/rmd/gen_additive_mod_mgcv.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ gen_additive_mod() %>%
The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter.


However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.


```r
Expand All @@ -106,8 +106,8 @@ spec <-
set_mode("regression")

workflow() %>%
add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>%
add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>%
add_formula(mpg ~ wt + gear + cyl + disp) %>%
fit(data = mtcars) %>%
extract_fit_engine()
```
Expand All @@ -126,7 +126,7 @@ workflow() %>%
## GCV score: 4.225228
```

The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.
To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].

## Preprocessing requirements

Expand Down
4 changes: 2 additions & 2 deletions man/rmd/glmnet-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ tidy(fit)
## 4 hp -0.0101 1
## 5 drat 0 1
## 6 wt -2.59 1
## # … with 5 more rows
## # 5 more rows
```

Note that there is a `tidy()` method for `glmnet` objects in the `broom` package. If this is used directly on the underlying `glmnet` object, it returns _all of coefficients on the path_:
Expand All @@ -191,7 +191,7 @@ all_tidy_coefs
## 4 (Intercept) 4 24.7 3.89 0.347
## 5 (Intercept) 5 26.0 3.55 0.429
## 6 (Intercept) 6 27.2 3.23 0.497
## # … with 634 more rows
## # 634 more rows
```

```r
Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_glmnet.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center

The model does not fit an intercept.

The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
10 changes: 5 additions & 5 deletions man/rmd/proportional_hazards_glmnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center

The model does not fit an intercept.

The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down Expand Up @@ -89,10 +89,10 @@ predict(mod, pred_data, type = "survival", time = 500) %>%

```
## # A tibble: 2 × 5
## .time .pred_survival age ecog.ps rx
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 500 0.666 50 1 1
## 2 500 0.769 50 1 2
## .eval_time .pred_survival age ecog.ps rx
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 500 0.666 50 1 1
## 2 500 0.769 50 1 2
```

Note that columns used in the `strata()` function _will_ also be estimated in the regular portion of the model (i.e., within the linear predictor).
Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The model does not fit an intercept.

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The model does not fit an intercept.

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/surv_reg_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/surv_reg_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/survival_reg_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/survival_reg_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down

0 comments on commit 86f8a4e

Please sign in to comment.