Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double intercept for preprocessing formula with dot notation in model formula #210

Closed
hfrick opened this issue Nov 24, 2023 · 2 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@hfrick
Copy link
Member

hfrick commented Nov 24, 2023

Following on from tidymodels/censored#272, and moving beyond glmnet models, I found a bug in workflows where we end up with one too many intercepts...
Workflows adds an intercept in the pre stage, and then another gets added by the expansion of the . in the model formula.

library(parsnip)
library(workflows)

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ cyl + disp + hp) %>% 
  add_model(linear_reg(), formula = mpg ~ cyl + disp + hp) %>% 
  fit(data = mtcars)

# this is what we'd expect
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ cyl + disp + hp) %>% 
  add_model(linear_reg(), formula = mpg ~ .) %>% 
  fit(data = mtcars)

# this has one too many intercepts
coef(wflow_fit$fit$fit$fit)
#>   (Intercept) `(Intercept)`           cyl          disp            hp 
#>   34.18491917            NA   -1.22741994   -0.01883809   -0.01467933

Created on 2023-11-24 with reprex v2.0.2

So far, I think this does only affect this specific case of preprocessing formula + dot in model formula.

Reprex for it not being the dot expansion in the preprocessing formula
library(parsnip)
library(workflows)

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ .) %>% 
  add_model(linear_reg(), formula = mpg ~ cyl + disp + hp) %>% 
  fit(data = mtcars)
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

Created on 2023-11-24 with reprex v2.0.2

Reprex for it being the interaction with the preprocessing formula
library(parsnip)
library(workflows)
library(recipes)

# recipe as preprocessor --------------------------------------------------

wflow_fit <- 
  workflow() %>% 
  add_recipe(recipe(mpg ~ cyl + disp + hp, mtcars)) %>% 
  add_model(linear_reg(), formula = mpg ~ cyl + disp + hp) %>% 
  fit(data = mtcars)
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

wflow_fit <- 
  workflow() %>% 
  add_recipe(recipe(mpg ~ cyl + disp + hp, mtcars)) %>% 
  add_model(linear_reg(), formula = mpg ~ .) %>% 
  fit(data = mtcars)
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933


# variables as preprocessor -----------------------------------------------

wflow_fit <- 
  workflow() %>% 
  add_variables(outcomes = mpg, predictors = c(cyl, disp, hp)) %>% 
  add_model(linear_reg(), formula = mpg ~ cyl + disp + hp) %>% 
  fit(data = mtcars)
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

wflow_fit <- 
  workflow() %>% 
  add_variables(outcomes = mpg, predictors = c(cyl, disp, hp)) %>% 
  add_model(linear_reg(), formula = mpg ~ .) %>% 
  fit(data = mtcars)
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

Created on 2023-11-24 with reprex v2.0.2

@hfrick
Copy link
Member Author

hfrick commented Dec 11, 2023

Closed in tidymodels/parsnip#1033

library(parsnip)
library(workflows)

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ cyl + disp + hp) %>% 
  add_model(linear_reg(), formula = mpg ~ cyl + disp + hp) %>% 
  fit(data = mtcars)

# this is what we'd expect
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ cyl + disp + hp) %>% 
  add_model(linear_reg(), formula = mpg ~ .) %>% 
  fit(data = mtcars)

# this is okay now too
coef(wflow_fit$fit$fit$fit)
#> (Intercept)         cyl        disp          hp 
#> 34.18491917 -1.22741994 -0.01883809 -0.01467933

Created on 2023-12-11 with reprex v2.0.2

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants