-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction in classification problems - unclear which level is predicted #57
Comments
what is happening here is that {orbital} is not supporting any classification models YET. But for some reason, it still worked and was treated as a regression model, which is a bug and will be fixed. We are tracking classification models here: #46 thanks for reporting! |
{orbital} now handles prediction with library(orbital)
library(tidymodels)
library(dplyr)
hotels <-
readr::read_csv("https://tidymodels.org/start/case-study/hotels.csv") %>%
mutate(across(where(is.character), as.factor)) %>%
mutate(children=if_else(children=="children", 1, 0) %>% factor(levels=c(1,0))) %>%
select(-arrival_date)
lr_mod <-
logistic_reg() %>%
set_engine("glm")
lr_recipe <-
recipe(children ~ ., data = hotels) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
lr_workflow <-
workflow() %>%
add_model(lr_mod) %>%
add_recipe(lr_recipe)
wf_fit <- fit(lr_workflow, hotels)
predict(wf_fit, hotels, type="prob")
#> # A tibble: 50,000 × 2
#> .pred_1 .pred_0
#> <dbl> <dbl>
#> 1 0.0154 0.985
#> 2 0.113 0.887
#> 3 0.0204 0.980
#> 4 0.0362 0.964
#> 5 0.793 0.207
#> 6 0.00922 0.991
#> 7 0.944 0.0561
#> 8 0.487 0.513
#> 9 0.0681 0.932
#> 10 0.103 0.897
#> # ℹ 49,990 more rows
orb_obj <- orbital(wf_fit)
predict(orb_obj, hotels)
#> # A tibble: 50,000 × 1
#> .pred_class
#> <chr>
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 1
#> 6 0
#> 7 1
#> 8 0
#> 9 0
#> 10 0
#> # ℹ 49,990 more rows
orb_obj <- orbital(wf_fit, type = "prob")
predict(orb_obj, hotels)
#> # A tibble: 50,000 × 2
#> .pred_1 .pred_0
#> <dbl> <dbl>
#> 1 0.0154 0.985
#> 2 0.113 0.887
#> 3 0.0204 0.980
#> 4 0.0362 0.964
#> 5 0.793 0.207
#> 6 0.00922 0.991
#> 7 0.944 0.0561
#> 8 0.487 0.513
#> 9 0.0681 0.932
#> 10 0.103 0.897
#> # ℹ 49,990 more rows
orb_obj <- orbital(wf_fit, type = c("class", "prob"))
predict(orb_obj, hotels)
#> # A tibble: 50,000 × 3
#> .pred_class .pred_1 .pred_0
#> <chr> <dbl> <dbl>
#> 1 0 0.0154 0.985
#> 2 0 0.113 0.887
#> 3 0 0.0204 0.980
#> 4 0 0.0362 0.964
#> 5 1 0.793 0.207
#> 6 0 0.00922 0.991
#> 7 1 0.944 0.0561
#> 8 0 0.487 0.513
#> 9 0 0.0681 0.932
#> 10 0 0.103 0.897
#> # ℹ 49,990 more rows |
The problem
I'm predicting a variable with levels 0 and 1 where I've ordered the factors so 1 is first. When using orbital, the predicted probability returned is the probability of seeing "0" but I would expect it to be the first level.
In the example below, you can see the difference between predicting with predict on the fitted workflow vs predicting using the orbital object. I expected the orbital object to predict
.pred_1
but it is predicting.pred_0
.Reproducible example
Created on 2024-08-28 with reprex v2.1.0
Session info
The text was updated successfully, but these errors were encountered: